Debugging lock dependency cycles

Lock dependency cycles are a common source of deadlocks. This guide provides instructions for detecting, debugging, and resolving lock dependency cycles.

Rust

Rust programs on Fuchsia can use fuchsia_sync for their locks to benefit from additional runtime checks that detect access patterns that can deadlock.

These checks rely on the tracing_mutex crate to detect cycles between lock acquisitions across different threads.

Adopting fuchsia_sync

To start using fuchsia_sync in your code, follow these steps:

  1. Add //src/lib/fuchsia-sync to your deps.
  2. Replace std::sync::Mutex in your code with fuchsia_sync::Mutex.
  3. Replace std::sync::RwLock with fuchsia_sync::RwLock.
  4. Remove any error handling for poisoned locks, as fuchsia_sync does not support lock poisoning.

Enabling cycle checks

These checks are enabled in fuchsia_sync by default in debug builds.

You can manually enable them in balanced or release builds by setting a GN arg:

fx set ... --args=fuchsia_sync_detect_lock_cycles=true

If a lock cycle is detected you will see a panic message like this:

thread 'main' (1) panicked at ../../third_party/rust_crates/forks/tracing-mutex-0.3.2/src/reporting.rs:
Found cycle in mutex dependency graph:
disabled backtrace

stack backtrace:
...

See the next section for instructions on how to enable backtraces.

Printing cycle backtraces

tracing-mutex will always print a backtrace for the panicking thread that would actually trigger a deadlock, but it is often useful to also know what other lock acquisitions are a part of a cycle.

The instrumentation will collect and print these additional backtraces when the RUST_BACKTRACE environment variable is set to 1. Note that this comes with a large performance overhead on top of the instrumentation's baseline overhead.

For an ELF component, include this shard in your component manifest to collect backtraces for all lock acquisitions and print relevant ones when a deadlock is detected:

{
  include: [ "//src/lib/fuchsia-sync/meta/enable_rust_backtrace.shard.cml" ],
  // ...
}

Suppressing panics

You can suppress panics from lock cycles by calling

fuchsia_sync::suppress_lock_cycle_panics();

Ensuring consistent lock access order

This section lists some strategies that you can use to prevent deadlocks once this instrumentation has identified a cycle.

Example

Consider the following code:

fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    let mut foo = foo.lock();
    let mut bar = bar.lock();
    foo.do_thing();
    bar.do_thing();
}

fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    let mut bar = bar.lock();
    let mut foo = foo.lock();
    foo.do_other_thing();
    bar.do_other_thing();
}

fn main() {
    let foo = Mutex::new(...);
    let bar = Mutex::new(...);

    let first = std::thread::spawn(|| do_thing_to_both(foo, bar));
    let second = std::thread::spawn(|| do_other_thing_to_both(foo, bar));

    first.join().unwrap();
    second.join().unwrap();
}

This code will deadlock in scenarios where events occur in the following order:

  1. first acquires foo
  2. second acquires bar
  3. first attempts to acquire bar but it is held by second
  4. second attempts to acquire foo but it is held by first

Steps (3) and (4) will block without any thread able to wake them, leading to a deadlock. tracing-mutex will panic with a message indicating that a cycle has been detected.

Depending on the synchronization requirements of the locks in your use case, you may be able to avoid this cycle in a couple of ways.

Removing overlapping lock acquisitions

The simplest way to prevent a lock acquisition from participating in a cycle is to release the lock before acquiring the next one. This is useful if the values guarded by the two locks don't actually require their modifications to be synchronized.

The above example can be fixed by updating the code as follows:

fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    {
        let mut foo = foo.lock();
        foo.do_thing();
    }
    {
        let mut bar = bar.lock();
        bar.do_thing();
    }
}

fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    {
        let mut bar = bar.lock();
        bar.do_other_thing();
    }
    {
        let mut foo = foo.lock();
        foo.do_other_thing();
    }
}

// ...

By releasing each lock before acquiring the next one, we ensure that no thread can starve any other thread indefinitely.

This will allow modifications to the two variables to be interleaved but that is acceptable in many situations.

Aligning lock access order

In cases where it's important for accesses to two or more locks to be synchronized, you must ensure that all threads acquire the locks in the exact same order every time.

In the simplified example you could achieve this by swapping the order the locks are acquired in do_other_thing_to_both():

fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    // This order is the same as the original example.
    let mut foo = foo.lock();
    let mut bar = bar.lock();
    foo.do_thing();
    bar.do_thing();
}

fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    // Now the code acquires the locks in the same order as do_thing_to_both().
    let mut foo = foo.lock();
    let mut bar = bar.lock();
    foo.do_other_thing();
    bar.do_other_thing();
}

// ...

By always locking foo before locking bar, you ensure that all threads acquire the locks in the same order and prevent them from forming a cycle and deadlocking.

Asserting the correct acquisition order

When possible, acquire locks in their intended order early in their lifecycle. This informs future readers and the cycle instrumentation of the correct acquisition order, ensuring that panic messages have source locations pointing to the callsite with incorrect usage.

Limit these extra lock acquisitions to builds with debug_assertions enabled to avoid any performance penalty in release builds.

In the simple case of two locks, this means acquiring both locks in the desired order shortly after they are created. For example:

fn main() {
    let foo = Mutex::new(...);
    let bar = Mutex::new(...);

    // foo should always be acquired before bar if they need to overlap.
    #[cfg(debug_assertions)]
    {
        let _foo = foo.lock();
        let _bar = bar.lock();
    }

    // ...
}

This will ensure that panics will come from code where bar is acquired before foo, regardless of the exact order of the logic under test.