Skip to main content

Module developer_docs

Module developer_docs 

Source
Expand description

§📚 Developer Documentation

§Overview

The Smoke and Mirrors codebase consists of the following components:

  • navigator
  • inject
  • libinject
  • winafl-netspoof
  • seeding

The role each component plays in the project is as follows:

The main entry point to the project is the binary in the navigator crate. This kicks off the explorative loop: the malware sample is repeatedly instrumented and executed, and the instrumentation for the next execution is dynamically configured in an attempt to elicit previously unseen behaviour from the sample.

From this main loop, an instrumeted execution of the sample is run as a subprocess - and the subprocess entry point is a pre-built binary provided by DynamoRIO (the dynamic binary instrumentation (DBI) framework used by SaM). Two important things are passed to the DynamoRIO drrun.exe binary:

  1. The system path to the malware sample (which will be run under the DynamoRIO DBI engine).
  2. The system path to a Dynamic-Link Library (DLL) describing how to instrument the sample. inject is a cmake project that compiles to this DLL.

In practice, the DLL exports a special symbol called dr_client_main, which is called by the DynamoRIO engine at runtime.

There is a lot a tooling made available by DynamoRIO that can be used within this dr_client_main function to achieve the desired instrumentation - such as APIs for querying the loaded module space, hooking function calls, inspecting memory addresses, etc.

However, DynamoRIO exposes C bindings for this tooling, and SaM is written primarily in Rust. For that reason the libinject crate contains:

  1. Rust bindings for the DynamoRIO tooling (auto-generated from the C bindings using rust-bindgen).
  2. Logic that uses these bindings to achieve the configurable custom instrumentation required by SaM (interposing on network calls to simulate a specific network environment, recording requested endpoints and packets).
  3. Definitions for external symbols that can be called from dr_client_main.
  4. A Cargo.toml manifest that builds the crate as a static library so that it can be linked with the inject cmake project.

In summary, the call stack is:

  1. navigator.exe binary starts drrun.exe as a subprocess
  2. which in turn calls into the dr_client_main function in inject.dll
  3. inject.dll has been linked with the static library libinject.lib
  4. and libinject.lib has been linked with the tooling exported by DynamoRIO.
navigator.exe -> drrun.exe -> inject.dll -> libinject.lib

§Components

Below is a brief summary for each component. See the API Reference for crate level Rust documentation for the crates:

A Rust crate with a library and binary crate. The binary is the main SaM entrypoint. The library code exposes the following functionality:

  • Build and maintain Network-Event Trees (NETs), by aggregating sequences of network events.
  • Traverse NETs to find unexplored branches.
  • Start and manage subprocesses for:
    1. Instrumented malware executions
    2. Fuzzing sessions
  • Watch the stream of events from a fuzzing session to decide when to terminate the fuzzing session.
  • Set up and tear down TCP and UDP servers that are used to pass traffic between navigator and Fakenet (when SaM is in Network-mode, see Network-mode vs. Buffer-mode for more details).

§inject

A cmake project that compiles to a Windows DLL. The DLL exports the dr_client_main symbol, which is called by DynamoRIO, and it is otherwise a light C wrapper around the core instrumentation logic defined in libinject.

§libinject

A Rust static library which is linked with inject.dll and winafl.dll. Note that libinject can only be compiled for a Windows target. Contains:

  • Rust bindings for the DynamoRIO C library (auto-generated from the C bindings using rust-bindgen).
  • A CLI interface that parses serialised _proposed trajectories_ (see How SaM works) that are provided by the navigator NET search logic.
  • Instrumentation to interpose on network API calls, using the parsed trajectory to decide how to respond to each network event (e.g. whether to accept or reject a connect call, and what responses to provide to a given request).
  • Instrumentation to collect additional, fine-grained coverage information when fuzzing.

§winafl-netspoof (git submodule)

A fork of WinAFL, a cmake project (as a git submodule) that will compile two Windows dll’s (winafl.dll and mutator.dll) and a binary (afl-fuzz.exe). afl-fuzz.exe is the main entrypoint for a fuzzing session - it runs many DynamoRIO subprocesses, using winafl.dll as the tool/client each time.

afl-fuzz.exe also links with mutator.dll, which provides a custom mutator for fuzzing candidates.

§seeding

Generates fuzzing seed data in a bespoke binary format from an ExchangeDataset msgpack file. Uses the provided protocol to sort request-response pairs by function code & length of response bytes required.

§Fuzzing Sessions

The role that fuzzing sessions play in the main SaM exploration loop is described in the section on How SaM works. It is implemented as follows:

  1. Rather than commencing an instrumented run of the sample (by starting a drrun.exe subprocess), the navigator binary starts aflfuzz.exe as a subprocess.
  2. aflfuzz.exe is a binary compiled by winafl-netspoof (our fork of WinAFL), and it is the entry-point for a customised WinAFL fuzzing session.
  3. Internally, aflfuzz.exe makes a call to a DLL called winafl.dll, which is also compiled by the winafl-netspoof project.
  4. winafl.dll describes how the sample must be instrumented during the fuzzing session - this includes both instrumentation for interposing on network calls, and instrumentation for coverage-guided fuzzing (which includes instrumentation to collect coverage information, and instrumenation to do soft-rollbacks (described as persistance-mode fuzzing in the WinAFL docs).
  5. winafl.dll links with, and calls into, libinject.lib, where our custom instrumentation is written in Rust (note that the out-of-the-box WinAFL instrumentation remains in winafl.dll, written in C).

In summary, during fuzzing sessions the call stack is:

  1. navigator.exe binary starts aflfuzz.exe as a subprocess
  2. which, alongside running AFL fuzzing processes, calls into the dr_client_main function in winafl.dll
  3. winafl.dll has been linked with the static library libinject.lib
  4. and libinject.lib has been linked with the tooling exported by DynamoRIO.
navigator.exe -> aflfuzz.exe -> winafl.dll -> libinject.lib

The symmetry with the setup described above in the Overview section exists because WinAFL (the project we build upon for coverage-guided fuzzing) does itself use DynamoRIO for its own dynamic instrumentation.

§How our fork (winafl-netspoof) builds on WinAFL

winafl-netspoof is only used during fuzzing sessions. The entry point is the binary aflfuzz.exe, which is part of WinAFL and unchanged in our fork.

§The most important file: winafl.c

winafl-netspoof/winafl.c compiles into the DLL winafl.dll, and it is essentially a DynamoRIO tool that describes how the target binary must be instrumented during a fuzzing session.

In our fork, winafl.c retains all of the existing instrumentation for conducting a persistant-mode fuzzing session (the mechanism WinAFL uses is discussed in the docs section covering limitations when working with Asynchronous runtimes).

The main additions to winafl.c in our fork are:

  • The WinAFL CLI is augmented to accept additional CLI arguments that we forward on to our libinject Rust static library.
  • libinject_init is called (with the forwarded CLI args) in the main dr_client_main function exported by winafl.c.
  • Our main network API hooks are registered with a call to wrap_network_symbols_extern.
  • The AFL coverage bitmap used in the standard WinAFL project is partitioned into two bitmaps - one for the existing WinAFL coverage information, and the other for our additional fine-grained coverage.
  • The instrumentation to collect our fine-grained coverage is registered alongside the calls to register the original WinAFL instrumentation.

The only other significant addition is the definition of our libinject static library in winafl-netspoof/CMakeLists.txt - this ensures that WinAFL links with out libinject library at compile time (winafl.c calls into several symbols defined in libinject).

§Generating Rust Bindings for DynamoRIO

libinject/build.rs_generate_dynamorio_bindings is a build file that can be un-commented and run to autogenerate Rust bindings for DynamoRIO using rust-bindgen.

The build file writes the bindings to an output file so that they can be commited as src code for the project. The benefit is twofold, the rust-analyzer can see the files when you are writing code that calls into the bindings (helps with auto-complete and reading docs ‘on-hover’), and the bindings do not have to be re-generated on every build.

The bindings are saved at libinject/src/ffi/ffi_64.rs and libinject/src/ffi/ffi_32.rs, for 64-bit bindings and 32-bit bindings, respectively.

§When to re-generate the bindings

This might only be required if updating to a newer version of DynamoRIO where the API has changed.