Improving your fuzzer

When you first begin fuzzing a new target, the fuzzer may crash very quickly. Fuzzing typically produces an initial spike of defects found followed by a long tail. The frequency of defects being found can drop for several reasons, including: * There are fewer defects in the code being tested. * There are sections of the code that aren't being tested by the fuzzer.

To distinguish between these, you need to be able to assess and improve your fuzzer.

Improve code coverage

The first step in improving a fuzzer is to understand how well it is performing currently. An obvious key metric for coverage-guided fuzzers is code coverage. You can collect this information using fx fuzz.

For example:

fx fuzz analyze package/fuzzer

This will run the fuzzer for 60 seconds and report the code coverage of the corpus on the device. If you specify a --staging option, files in that directory will first be added to the corpus.

Add or improve the seed corpus

If notice gaps in the code coverage, you can add individual inputs to a seed corpus:

  1. Add a directory to the source tree near your fuzzer.
  2. Add one or more files to this directory, each containing the raw bytes of a test input that causes the fuzzer to reach previously uncovered code.
  3. Add a resource GN target for this directory and add it to the fuzzer's deps.
  4. Add an argument with the package-relative path to the fuzzer's component manifest source.

For example:

cd $FUCHSIA_DIR
mkdir path-to-library/my-fuzzer-corpus
cp handcrafted-input path-to-library/corpus

And in //path/to/library/BUILD.gn:


import("//build/fuzz.gni")
import("//build/dist/resource.gni")

resource("my-library-corpus") {
  sources = [
    ...
    "corpus/handcrafted-input",
    ...
  ]
  outputs = [ "data/my-library-corpus/{{source_file_part}}" ]
}

cpp_fuzzer("my_fuzzer"){
  sources = [ "my-fuzzer.cc" ]
  deps = [
    ":my-library",
    ":my-library-corpus"
  ]
}

And in the fuzzer's component manifest source:

{
  ...
  program: {
    args: [
      ...
      "data/my-library-corpus",
      ...
    ]
  }
}

Make code friendlier to fuzzing

Generally, libFuzzer is fairly effective at finding inputs that explore new conditional branches when the decision is based on bytes of the input. For example, it can use instrumentation on comparison instructions, such as CMP, to determine what value is needed to match a check on some portion of the input.

But this approach can fail when the fuzzer encounters "fuzzer-hostile" conditions. These include:

C/C++

  • Conditions that use data from external sources. For example:
zx_cprng_draw(&val, sizeof(val));
if (val == 0) { ... }
  • Conditions checking values that are possible to construct, but hard to guess. For example:
uint32_t actual = header.checksum;
header.checksum = 0;
uint32_t expected =  crc32(0, reinterpret_cast<const uint8_t*>(&header), sizeof(header));
if (actual == expected) { ... }
int result = ECDSA_verify(0, data, data_len, signature, signature_len, ec_key);
if (result == 0) { ... }

Rust

  • Conditions that use data from external sources. For example:
let mut randbuf = [0; 8];
zx::cprng_draw(&mut randbuf)?;
let val = u64::from_le_bytes(randbuf);
if val == 0 { ... }
  • Conditions checking values that are possible to construct, but hard to guess. For example:
let mut c = Checksum::new();
c.add_bytes(&buf);
c.checksum()
if c == expected { ... }
let digest = H::hash(message);
if boringssl::ecdsa_verify(digest.as_ref(), self.bytes(), &key.inner.key) { ... }

Go

  • Conditions that use data from external sources. For example: golang if rand.Intn(100) == 0 { ... }

  • Conditions checking values that are possible to construct, but hard to guess. For example:

iCksum := ipv4.CalculateChecksum()
if iCksum != want { ... }
ecdsaKey, ok := key.(*ecdsa.PublicKey)
h := e.hash.New()
h.Write(msg)
if ecdsa.Verify(ecdsaKey, h.Sum(nil), ecdsaSignature.R, ecdsaSignature.S) { ... }

As a code maintainer, you can use conditional compilation to add workarounds to these conditions in the code being tested. libFuzzer refers to this as using a fuzzer-friendly build mode.

C/C++

Use the common build macro, FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION.

For example:

#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
  // Use hard-coded value when fuzzing.
  memset(&val, 0, sizeof(val));
#else
  zx_cprng_draw(&val, sizeof(val));
#endif

In this example, we have set all the bytes of val to always be zero. Depending on the code, it may be more useful to the fuzzer if val is some other deterministic value, or possibly even directly depends on the fuzzer input.

Rust

Use the fuzz cfg attribute.

For example:

#[cfg(not(fuzz))]
fn is_valid(&self, key: &EcPubKey<C>, message: &[u8]) -> bool {
    let digest = H::hash(message);
    boringssl::ecdsa_verify(digest.as_ref(), self.bytes(), &key.inner.key)
}

#[cfg(fuzz)]
fn is_valid(&self, key: &EcPubKey<C>, message: &[u8]) -> bool {
    // Skip validation when fuzzing.
    return true;
}

Go

Use the fuzz package. Since Go only performs conditional compilation at the file level, this package include two files that define an const Enabled <bool>. Which file is included, and therefore the value of Enabled is determined by whether the code is being built in a fuzzer variant or not.

For example:

import "fuzz"

func (b IPv4) CalculateChecksum() uint16 {
  if fuzz.Enabled {
    // Return hard-coded value when fuzzing.
    return uint16(0xffff)
  }
  return Checksum(b[:b.HeaderLength()], 0)
}

Add custom mutators

In some case, the inputs being provided by the fuzzer may be transformed before being acted on. This can greatly reduce the fuzzer's ability to associate inputs with the behaviors they produce. For example, a library being fuzzed my first decompress its inputs before processing them. The most effective way to fuzz this library is to perform mutations on uncompressed inputs, then compress them before invoking the library.

One way to achieve this is by adding custom mutators. These are user-provided implementations of an optional function defined by LLVM's fuzzing interface:

extern "C" size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size,
                                          size_t MaxSize, unsigned int Seed);

If provided, libFuzzer will call this function with a buffer Data of size MaxSize initially filled with Size bytes of a valid input from the corpus. This function can transform this data before calling another LLVM fuzzing interface function:

size_t LLVMFuzzerMutate(uint8_t *Data, size_t Size, size_t MaxSize);

This function performs the actual mutation to create a new input.

In the example of the library that requires compressed inputs, a custom mutator could decompress the input from the corpus, call LLVMFuzzerMutate on it to create a new input, and compress the result.

Google's public fuzzing documentation has detailed examples for this case and others. See Structure-Aware Fuzzing with libFuzzer.

Provide a dictionary

A dictionary is a set of tokens commonly found in an interface's valid inputs. When provided, a fuzzer will use a dictionary to construct inputs that are more likely to be valid and therefore provide deeper coverage.

If you know what sort of tokens your code expects, you can add them to a dictionary file, one per line. Then provide the file to the fuzzer as a resource in the same manner as a seed corpus, and add them to the fuzzer's component manifest source.

For example:

{
  ...
  program: {
    args: [
      ...
      "-dict=data/my-dictionary.txt",
      ...
    ]
  }
}

Improve fuzzer performance

Once a fuzzer is achieving good coverage, then another key metric is simply how many iterations it can perform for a given finite set of compute resources. Fuzzers and the code they test can have a considerable range of complexity, so there isn't a single number of iterations per second a fuzzer should perform or a single memory limit a fuzzer should stay below. But if everything else is the same, a faster, leaner fuzzer will have a greater likelihood of finding defects over a slower, more memory-intensive one.

Startup initialization

Often, code under test requires some amount of expensive set up before it can be tested. Performing this initialization on every iteration can drag down the fuzzer's speed. In these situations it can be better to lazily initialize a variable with a lifetime equal to that of the process.

For example:

bool SetUp() {
   DoSomeExpensiveWork();
   return true;
}

extern "C" LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  static bool ready = SetUp();
  ...
}

However, be very careful about program state that is carried over from one iteration to the next. If a defect depends not only on the current test input, but also some subset of all previous inputs, it can become very difficult to reproduce without replaying the entire fuzzer run.

Pre-allocate storage

Some code operates on large amounts of memory, such as compression algorithms. A fuzzer that tries to allocate and free many megabytes of memory on each iteration will see degraded performance. An alternate approach is to use pre-allocate a maximally-sized buffer.

For example:

static const size_t kMaxOutSize = 0x10000000; // 256 MB

extern "C" LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  static uint8_t out_buf[kMaxOutSize];
  size_t max = Decompressor::GetMaxUncompressedSize(size);
  if (sizeof(out_buf) < max) {
    return 0;
  }
  Decompressor::Decompress(data, size, out_buf, max);
  return 0;
}

As written, this introduces a trade-off between performance and sanitizer accuracy. If the code above instead allocated a memory region of length max, AddressSanitizer would be able to detect if Decompress overflowed by any amount. With a pre-allocated region, it may silently succeed. Fortunately, AddressSanitizer provides a way to manually poison memory.

For example:

#include <sanitizer/asan_interface.h>

static const size_t kMaxOutSize = 0x10000000; // 256 MB

extern "C" LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  static uint8_t out_buf[kMaxOutSize];
  size_t max = Decompressor::GetMaxUncompressedSize(size);
  if (sizeof(out_buf) < max) {
    return 0;
  }
  ASAN_POISON_MEMORY_REGION(&data[max], sizeof(out_buf) - max);
  Decompressor::Decompress(data, size, out_buf, max);
  ASAN_UNPOISON_MEMORY_REGION(&data[max], sizeof(out_buf) - max);
  return 0;
}