调试锁依赖关系循环

锁依赖环是死锁的常见来源。本指南提供了有关检测、调试和解决锁依赖关系循环的说明。

Rust

Fuchsia 上的 Rust 程序可以使用 fuchsia_sync 作为其锁，从而受益于额外的运行时检查，这些检查可以检测可能导致死锁的访问模式。

这些检查依赖于 tracing_mutex crate 来检测不同线程之间获取锁的循环。

采用 fuchsia_sync

如需开始在代码中使用 fuchsia_sync，请按以下步骤操作：

将 //src/lib/fuchsia-sync 添加到 deps 中。
将代码中的 std::sync::Mutex 替换为 fuchsia_sync::Mutex。
将 std::sync::RwLock 替换为 fuchsia_sync::RwLock。
移除针对中毒锁的任何错误处理，因为 fuchsia_sync 不支持锁中毒。

启用循环检查

在调试 build 中，这些检查默认在 fuchsia_sync 中启用。

您可以在平衡 build 或发布 build 中通过设置 GN 实参手动启用它们：

fx set ... --args=fuchsia_sync_detect_lock_cycles=true

如果检测到锁定周期，您会看到类似如下的严重错误消息：

thread 'main' (1) panicked at ../../third_party/rust_crates/forks/tracing-mutex-0.3.2/src/reporting.rs:
Found cycle in mutex dependency graph:
disabled backtrace

stack backtrace:
...

如需了解如何启用回溯，请参阅下一部分。

打印循环回溯

tracing-mutex 将始终为实际会触发死锁的 panic 线程打印回溯，但了解哪些其他锁获取是循环的一部分通常也很有用。

当 RUST_BACKTRACE 环境变量设置为 1 时，插桩会收集并打印这些额外的回溯信息。请注意，除了插桩的基准开销之外，这还会带来很大的性能开销。

对于 ELF 组件，请在组件清单中添加此 shard，以收集所有锁获取的轨迹，并在检测到死锁时打印相关轨迹：

{
  include: [ "//src/lib/fuchsia-sync/meta/enable_rust_backtrace.shard.cml" ],
  // ...
}

抑制恐慌

您可以通过调用以下函数来抑制锁循环导致的 panic：

fuchsia_sync::suppress_lock_cycle_panics();

确保锁定访问顺序一致

本部分列出了一些策略，可在检测到循环后用于防止死锁。

示例

请参考以下代码：

fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    let mut foo = foo.lock();
    let mut bar = bar.lock();
    foo.do_thing();
    bar.do_thing();
}

fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    let mut bar = bar.lock();
    let mut foo = foo.lock();
    foo.do_other_thing();
    bar.do_other_thing();
}

fn main() {
    let foo = Mutex::new(...);
    let bar = Mutex::new(...);

    let first = std::thread::spawn(|| do_thing_to_both(foo, bar));
    let second = std::thread::spawn(|| do_other_thing_to_both(foo, bar));

    first.join().unwrap();
    second.join().unwrap();
}

如果事件按以下顺序发生，此代码将出现死锁：

first 获取 foo
second 获取 bar
first 尝试获取 bar，但 bar 由 second 持有
second 尝试获取 foo，但 foo 由 first 持有

步骤 (3) 和 (4) 将阻塞，而没有任何线程能够唤醒它们，从而导致死锁。tracing-mutex 会发出恐慌消息，指示系统检测到有循环。

根据使用情形中锁的同步要求，您可以通过多种方式避免此循环。

移除重叠的锁获取

防止锁获取参与循环的最简单方法是在获取下一个锁之前释放锁。如果受两个锁保护的值实际上不需要同步修改，那么这种方法非常有用。

可以通过更新代码来修复上述示例，如下所示：

fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    {
        let mut foo = foo.lock();
        foo.do_thing();
    }
    {
        let mut bar = bar.lock();
        bar.do_thing();
    }
}

fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    {
        let mut bar = bar.lock();
        bar.do_other_thing();
    }
    {
        let mut foo = foo.lock();
        foo.do_other_thing();
    }
}

// ...

通过在获取下一个锁之前释放每个锁，我们可以确保没有线程会无限期地使任何其他线程处于饥饿状态。

这样一来，对这两个变量的修改就可以交错进行，但在许多情况下，这是可以接受的。

调整锁定访问顺序

如果对两个或更多个锁的访问需要同步，您必须确保所有线程每次都以完全相同的顺序获取锁。

在简化的示例中，您可以通过交换 do_other_thing_to_both() 中获取锁的顺序来实现此目的：

fn do_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    // This order is the same as the original example.
    let mut foo = foo.lock();
    let mut bar = bar.lock();
    foo.do_thing();
    bar.do_thing();
}

fn do_other_thing_to_both(foo: Mutex<...>, bar: Mutex<...>) {
    // Now the code acquires the locks in the same order as do_thing_to_both().
    let mut foo = foo.lock();
    let mut bar = bar.lock();
    foo.do_other_thing();
    bar.do_other_thing();
}

// ...

通过始终在锁定 bar 之前锁定 foo，您可以确保所有线程以相同的顺序获取锁，并防止它们形成循环和死锁。

断言正确的获取顺序

尽可能在锁的生命周期早期按预期顺序获取锁。这会告知未来的读者和循环检测正确的获取顺序，确保 panic 消息的来源位置指向使用不正确的调用点。

将这些额外的锁获取限制为启用了 debug_assertions 的 build，以避免在发布 build 中出现任何性能损失。

在只有两个锁的简单情况下，这意味着在创建这两个锁后不久，便以所需的顺序获取这两个锁。例如：

fn main() {
    let foo = Mutex::new(...);
    let bar = Mutex::new(...);

    // foo should always be acquired before bar if they need to overlap.
    #[cfg(debug_assertions)]
    {
        let _foo = foo.lock();
        let _bar = bar.lock();
    }

    // ...
}

这样可确保 panic 来自于在 foo 之前获取 bar 的代码，无论被测逻辑的确切顺序如何。