Invest in Sanitizers

Project leads: phosek@google.com, shayba@google.com
Area: Toolchain

Problem statement

Memory safety bugs continue to be the root cause of high-severity bugs that affect security. The presence of sanitizers also improves engineering productivity, by quickly exposing and root-causing difficult bugs. Despite Fuchsia’s relative strengths in this domain, such as stronger isolation between different software and a growing investment in memory-safe system programming languages, memory safety remains a concern on Fuchsia as it is on other platforms.

Currently Fuchsia uses several sanitizers to detect memory safety bugs:

AddressSanitizer (ASan) detects instances of out-of-bounds access, use after free / return / scope, and double free. Relatedly, kasan extends this to kernel code.
LeakSanitizer (LSan) detects memory leaks.
UndefinedBehaviorSanitizer (UBSan) detects specific issues of relying on undefined program behavior.
Relatedly, libFuzzer is supported on Fuchsia to run coverage-directed fuzz testing and detect crashes or issues that are detectable by the above sanitizers. There is ongoing work to improve kernel syscall fuzzing by making it coverage-directed.
Lastly, GWP-ASan, a sampling version of asan, is supported on Fuchsia. Work is underway to demonstrate its use in production to detect bugs in the field.

These sanitizers cover C/C++ code where memory safety isn’t guaranteed. They also detect memory bugs in Rust unsafe blocks and can find memory leaks in Rust code.

These tools have proven to be effective at finding bugs. They require no effort from the developer when bugs aren’t detected. When bugs are detected, troubleshooting is relatively easy since sanitizers provide stack traces for easy root cause analysis and since they exhibit reproducible behavior.

Efforts in 2020-2021 to roll out all three sanitizers broadly and to fix pre-existing bugs were successful. These efforts leaned on prior dedicated work to bring up runtime instrumentation support on Fuchsia. They were staffed temporarily by volunteers and 20%ers, and have since concluded.

However we continue to see room for improvement. Particularly:

Some hardware-dependent code isn’t covered by sanitizers, mostly due to runtime performance issues and automation gaps.
Sanitizers for kernel code lag behind those for user space code.
There exist additional classes of severe bugs that have sanitizer support elsewhere but not yet on Fuchsia, particularly for uninitialized reads and a variety of thread-safety bugs.
Bugs that sanitizers detect in the system outside the boundaries of a particular test are not root-caused automatically, requiring manual triage to assign them to an owner.

The first two issues are particularly concerning at a time when the Fuchsia team is investing more in device driver development and out-of-tree development, as well as both of those combined. The other issues are ongoing deficiencies that lower engineering productivity.

Solution statement

Increase our investment in sanitizers, building up on previous investments in LLVM compiler runtime instrumentation support. Specifically:

Bring up support for more sanitizers, such as:

Hardware-accelerated AddressSanitizer (hwasan) which significantly reduces the memory overhead of asan and makes instrumentation viable on RAM-constrained devices.
ThreadSanitizer (TSan) which detects data races, building on prior work.
Kernel support for concurrency sanitization (kcsan).
MemorySanitizer (MSan) which detects reads from uninitialized memory.

Fix long-standing issues with existing sanitizers, such as:

Known UBSan bugs.
Long-standing lsan issues such as testing gaps and races.
Engage with engineering teams to build a culture of fixing sanitizer bugs. Pay down the tech debt of pre-existing sanitizer bugs.

Investigate opportunities and present a roadmap for future work, such as:

Enabling sanitizers on more configurations (beyond qemu) in such a way that tracks Fuchsia’s priorities.
Identifying exactly what code is not exercised under instrumentation (such as drivers and other hardware-dependent code), and closing these gaps by priority order.
Measure and quantify the impact of sanitizers on Fuchsia - the magnitude and distribution of issues found, their severity, time from detection to repair, an inventory of tech debt and a plan to conquer it.
Research new opportunities, for instance leveraging hardware support for sanitizers which is already seeing adoption on other platforms, or making additional optimizations such as inlining viable in instrumented builds.
Consider growing our investment in sanitizers for Rust, for instance overcoming FFI boundaries when checking for memory safety issues or introducing a UBSan equivalent for Rust.

Dependencies

Sanitizer bringup work will rely on LLVM expertise that is present in the Fuchsia Toolchain team and should be extended to more individuals to promote team health.
Hardware-accelerated features such as top-byte ignore (TBI) require kernel support across all syscalls that pass pointers.
Exercising sanitizers on hardware will require lab device provisioning and changes to build & test automation capacity and configuration by the EngProd team.
Sanitizer variants and various toggles need to be supported by the Fuchsia Build team.

Note that all of the above teams are already committed to sanitizers work in principle, and meet regularly.

Risks and mitigations

Some efforts, particularly bringup, can take a couple of years to demonstrate and validate. This requires long term commitment and dedication to the work, as well as patience.
Some expectations on the viability of instrumentation on hardware are speculative, assuming positive results that track prior art on other platforms and subject to specific and undisclosed target hardware choices.