Expand description
§📚 Developer Documentation
§Overview
The Smoke and Mirrors codebase consists of the following components:
navigatorinjectlibinjectwinafl-netspoofseeding
The role each component plays in the project is as follows:
The main entry point to the project is the binary in the navigator crate. This kicks off the explorative loop: the malware sample is repeatedly instrumented and executed, and the instrumentation for the next execution is dynamically configured in an attempt to elicit previously unseen behaviour from the sample.
From this main loop, an instrumeted execution of the sample is run as a subprocess - and the subprocess entry point is a pre-built binary provided by DynamoRIO (the dynamic binary instrumentation (DBI) framework used by SaM). Two important things are passed to the DynamoRIO drrun.exe binary:
- The system path to the malware sample (which will be run under the DynamoRIO DBI engine).
- The system path to a Dynamic-Link Library (DLL) describing how to instrument the sample.
injectis a cmake project that compiles to this DLL.
In practice, the DLL exports a special symbol called dr_client_main, which is called by the DynamoRIO engine at runtime.
There is a lot a tooling made available by DynamoRIO that can be used within this dr_client_main function to achieve the desired instrumentation - such as APIs for querying the loaded module space, hooking function calls, inspecting memory addresses, etc.
However, DynamoRIO exposes C bindings for this tooling, and SaM is written primarily in Rust. For that reason the libinject crate contains:
- Rust bindings for the DynamoRIO tooling (auto-generated from the C bindings using
rust-bindgen). - Logic that uses these bindings to achieve the configurable custom instrumentation required by SaM (interposing on network calls to simulate a specific network environment, recording requested endpoints and packets).
- Definitions for external symbols that can be called from
dr_client_main. - A
Cargo.tomlmanifest that builds the crate as a static library so that it can be linked with theinjectcmake project.
In summary, the call stack is:
navigator.exebinary startsdrrun.exeas a subprocess- which in turn calls into the
dr_client_mainfunction ininject.dll inject.dllhas been linked with the static librarylibinject.lib- and
libinject.libhas been linked with the tooling exported by DynamoRIO.
navigator.exe -> drrun.exe -> inject.dll -> libinject.lib§Components
Below is a brief summary for each component. See the API Reference for crate level Rust documentation for the crates:
§navigator
A Rust crate with a library and binary crate. The binary is the main SaM entrypoint. The library code exposes the following functionality:
- Build and maintain Network-Event Trees (NETs), by aggregating sequences of network events.
- Traverse NETs to find unexplored branches.
- Start and manage subprocesses for:
- Instrumented malware executions
- Fuzzing sessions
- Watch the stream of events from a fuzzing session to decide when to terminate the fuzzing session.
- Set up and tear down TCP and UDP servers that are used to pass traffic between
navigatorand Fakenet (when SaM is in Network-mode, see Network-mode vs. Buffer-mode for more details).
§inject
A cmake project that compiles to a Windows DLL. The DLL exports the dr_client_main symbol, which is called by DynamoRIO, and it is otherwise a light C wrapper around the core instrumentation logic defined in libinject.
§libinject
A Rust static library which is linked with inject.dll and winafl.dll. Note that libinject can only be compiled for a Windows target. Contains:
- Rust bindings for the DynamoRIO C library (auto-generated from the C bindings using
rust-bindgen). - A CLI interface that parses serialised
_proposed trajectories_(see How SaM works) that are provided by thenavigatorNET search logic. - Instrumentation to interpose on network API calls, using the parsed trajectory to decide how to respond to each network event (e.g. whether to accept or reject a
connectcall, and what responses to provide to a given request). - Instrumentation to collect additional, fine-grained coverage information when fuzzing.
§winafl-netspoof (git submodule)
A fork of WinAFL, a cmake project (as a git submodule) that will compile two Windows dll’s (winafl.dll and mutator.dll) and a binary (afl-fuzz.exe). afl-fuzz.exe is the main entrypoint for a fuzzing session - it runs many DynamoRIO subprocesses, using winafl.dll as the tool/client each time.
afl-fuzz.exe also links with mutator.dll, which provides a custom mutator for fuzzing candidates.
§seeding
Generates fuzzing seed data in a bespoke binary format from an ExchangeDataset msgpack file. Uses the provided protocol to sort request-response pairs by function code & length of response bytes required.
§Fuzzing Sessions
The role that fuzzing sessions play in the main SaM exploration loop is described in the section on How SaM works. It is implemented as follows:
- Rather than commencing an instrumented run of the sample (by starting a
drrun.exesubprocess), thenavigatorbinary startsaflfuzz.exeas a subprocess. aflfuzz.exeis a binary compiled bywinafl-netspoof(our fork of WinAFL), and it is the entry-point for a customised WinAFL fuzzing session.- Internally,
aflfuzz.exemakes a call to a DLL calledwinafl.dll, which is also compiled by thewinafl-netspoofproject. winafl.dlldescribes how the sample must be instrumented during the fuzzing session - this includes both instrumentation for interposing on network calls, and instrumentation for coverage-guided fuzzing (which includes instrumentation to collect coverage information, and instrumenation to do soft-rollbacks (described as persistance-mode fuzzing in the WinAFL docs).winafl.dlllinks with, and calls into,libinject.lib, where our custom instrumentation is written in Rust (note that the out-of-the-box WinAFL instrumentation remains inwinafl.dll, written in C).
In summary, during fuzzing sessions the call stack is:
navigator.exebinary startsaflfuzz.exeas a subprocess- which, alongside running AFL fuzzing processes, calls into the
dr_client_mainfunction inwinafl.dll winafl.dllhas been linked with the static librarylibinject.lib- and
libinject.libhas been linked with the tooling exported by DynamoRIO.
navigator.exe -> aflfuzz.exe -> winafl.dll -> libinject.libThe symmetry with the setup described above in the Overview section exists because WinAFL (the project we build upon for coverage-guided fuzzing) does itself use DynamoRIO for its own dynamic instrumentation.
§How our fork (winafl-netspoof) builds on WinAFL
winafl-netspoof is only used during fuzzing sessions. The entry point is the binary aflfuzz.exe, which is part of WinAFL and unchanged in our fork.
§The most important file: winafl.c
winafl-netspoof/winafl.c compiles into the DLL winafl.dll, and it is essentially a DynamoRIO tool that describes how the target binary must be instrumented during a fuzzing session.
In our fork, winafl.c retains all of the existing instrumentation for conducting a persistant-mode fuzzing session (the mechanism WinAFL uses is discussed in the docs section covering limitations when working with Asynchronous runtimes).
The main additions to winafl.c in our fork are:
- The WinAFL CLI is augmented to accept additional CLI arguments that we forward on to our
libinjectRust static library. libinject_initis called (with the forwarded CLI args) in the maindr_client_mainfunction exported bywinafl.c.- Our main network API hooks are registered with a call to
wrap_network_symbols_extern. - The AFL coverage bitmap used in the standard WinAFL project is partitioned into two bitmaps - one for the existing WinAFL coverage information, and the other for our additional fine-grained coverage.
- The instrumentation to collect our fine-grained coverage is registered alongside the calls to register the original WinAFL instrumentation.
The only other significant addition is the definition of our libinject static library in winafl-netspoof/CMakeLists.txt - this ensures that WinAFL links with out libinject library at compile time (winafl.c calls into several symbols defined in libinject).
§Generating Rust Bindings for DynamoRIO
libinject/build.rs_generate_dynamorio_bindings is a build file that can be un-commented and run to autogenerate Rust bindings for DynamoRIO using rust-bindgen.
The build file writes the bindings to an output file so that they can be commited as src code for the project. The benefit is twofold, the rust-analyzer can see the files when you are writing code that calls into the bindings (helps with auto-complete and reading docs ‘on-hover’), and the bindings do not have to be re-generated on every build.
The bindings are saved at libinject/src/ffi/ffi_64.rs and libinject/src/ffi/ffi_32.rs, for 64-bit bindings and 32-bit bindings, respectively.
§When to re-generate the bindings
This might only be required if updating to a newer version of DynamoRIO where the API has changed.