Runtime Lock Validation in Zircon and Fuchsia

Introduction

Lock validation is a technique for checking the consistency of locking behavior in a program to find potential deadlock hazards. This document discusses relevant aspects of the static and dynamic approaches to lock validation and presents the foundation for the runtime lock validation library available in Zircon and Fuchsia.

Background

Lock validation may be performed either statically or dynamically. The following summarizes the important differences between static and dynamic approaches to lock validation:

When the validation is performed: compile time vs. run time.
How effective the validator is at finding potential problems.
What level of involvement is required by the programmer.
The overhead cost of the validation itself.

Static Validation

Static validation is typically performed at compile time by analyzing the call graphs produced by the compiler or other source-level processor. With this approach it is necessary to instrument the code and locking primitives with annotations to inform the validator about which types represent locks and which rules to apply (or not) to the code that uses the lock types.

The benefits of static validation include early detection of issues at build time, deterministic validation results, and zero runtime overhead. This combination of properties make it attractive to always enable static validation, ensuring that locking issues are often caught before code makes it into the build, without impacting the performance of the build artifacts.

Static validation also has some down sides. One problem is that static validation requires correct, consistent application of a variety of annotations to both locks and code to provide useful results. This can cause maintenance issues unless diligent code review standards are implemented. Another issue is that static validation has limited visibility and can be fooled by conditional paths, dynamic dispatch, move semantics, and lock dependencies that span compilation units.

Dynamic Validation

Dynamic validation is performed online at runtime by observing the relationships between locks as the code executes. With this approach it is generally sufficient to instrument just the locking primitives and acquire/release operations to provide the information required for validation.

The benefits of dynamic validation include simpler instrumentation (from the user's perspective) and potentially greater visibility into the actual runtime behavior of the program. This makes dynamic validation useful in large code bases, where it may not be possible for static validation to see the full set of possible lock interactions.

The main downsides of dynamic validation are runtime overhead and execution coverage requirements. Because dynamic validation must track lock interactions at runtime, each acquire and release incurs a non-zero execution cost to update tracking data, in addition to the memory overhead of the tracking data itself. Runtime tracking also has the consequence that code paths that are not executed cannot be analyzed by the validator. This may increase the burden on the developer and QA to ensure sufficient execution coverage if that is not already a project requirement.

Locking Ordering Invariant

The job of the lock validator is to determine whether or not the lock invariants of the program hold. The primary invariant is the order between two or more locks: all paths in a program that acquire two or more locks must do so in an order consistent with every other path involving two or more of the same locks to avoid the potential for deadlock. Environments that deal with hardware interrupts, such as embedded systems and kernels, have an additional ordering invariant to avoid interrupt-induced deadlocks. These invariants are illustrated in the following subsections.

Basic Inversion

The simplest form of inversion occurs when a program has two locks that are both acquired sequentially with inconsistent orders in different paths.

For example, a program with the locks A and B and code paths P₁ and P₂ and the following behavior has the potential for deadlock:

Path P₁ acquires and releases the locks in the sequence:

Acquire(A)
Acquire(B)
Release(B)
Release(A)

Path P₂ acquires and releases the locks in the inverted sequence:

Acquire(B)
Acquire(A)
Release(A)
Release(B)

With the right interleaving, perhaps due to both paths executing concurrently on different threads, a deadlock occurs when path P₁ holds lock A and blocks waiting for lock B, while path P₂ holds lock B and blocks waiting for lock A.

Circular Dependency

Inversion may also occur between more than two locks and paths. This kind of inversion is much harder to recognize through manual inspection because each pair of locks involved may appear to be correctly ordered in every path involving just the pairs, and yet a potential deadlock may still exist given overall ordering of the locks.

For example, a program with the locks A, B, and C; paths P₁, P₂, and P₃; with the following behavior has the potential for deadlock:

Path P₁ acquires and releases the locks in the sequence:

Acquire(A)
Acquire(B)
Release(B)
Release(A)

Path P₂ acquires and releases the locks in the sequence:

Acquire(B)
Acquire(C)
Release(C)
Release(B)

Path P₃ acquires and releases the locks in the sequence:

Acquire(C)
Acquire(A)
Release(A)
Release(C)

With the right interleaving of paths P₁, P₂, and P₃ a deadlock occurs as each path acquires the lock at the first step and waits for the lock at the second step. In practice this situation may be compounded by the existence of many different paths that produce the same pairwise lock sequences.

IRQ-Safe Ordering

In systems that deal with hardware interrupts the ordering between irq-safe and non-irq-safe locks is critical: a non-irq-safe lock must never be acquired while holding an irq-safe lock to prevent indirect lock inversions. Irq-safe locks preserve ordering between irq and non-irq context; a consistent order of two or more irq-safe locks is guaranteed to be safe for paths running in both irq and non-irq context. The same is not true for non-irq-safe locks. The reason for this is that non-irq-safe locks permit irq handlers to effectively insert the locks acquired by the handler at arbitrary points in the interrupted task's lock sequences.

For example, a system with non-irq-safe lock A and irq-safe lock B_irq; paths P₁, P₂, and irq path P_irq; with the following behavior has the potential for deadlock:

Path P₁ on CPU1 acquires and releases the lock in sequence:

Acquire(A)
P_irq interrupts here on CPU1
Release(A)

Path P_irq on CPU1 acquires and releases the lock in sequence:

Acquire(B_irq)
Release(B_irq)

Path P₂ on CPU2 acquires and releases the locks in sequence:

Acquire(B_irq)
Acquire(A)
Release(A)
Release(B_irq)

With the right interleaving of paths P₁, P₂, and P_irq a deadlock occurs as P_irq attempts to acquire B_irq while P₂ holds B_irq and blocks waiting for A. This is an indirect lock inversion: P_irq effectively inserts an acquire/release sequence of B_irq in the middle of the acquire/release sequence of A in path P₁, which is inconsistent with the lock sequence for the same locks in path P₂.

Performing Validation

The invariants discussed in the previous section can be validated using a finite directed graph. The directed graph tracks the identity and order of locks as the analysis traverses the code paths. Such a graph can be built either by traversing the call graphs generated by a compiler or source-level processor (static analysis) or by observing the ordering of locks during program execution (dynamic analysis). This section introduce the process in abstract terms that apply to either approach, in preparation for developing a concrete dynamic analysis strategy later on.

In the most general terms, building a directed graph from a code path requires maintaining a list of actively held locks as the path is traversed: a node representing a lock is added to the list whenever the lock is acquired and removed from the list whenever the lock is released. In addition to maintaining the active list, a directed edge is added to the graph from a vertex representing the newly acquired lock to each vertex representing a lock already in the list.

Basic Inversion Example

This section illustrates a directed graph approach to detect a basic two-lock inversion.

Recall from the earlier example a program with the locks A and B; code paths P₁ and P₂; and the following behavior:

Path P₁ acquires and releases the locks in the sequence:

Acquire(A)
Acquire(B)
Release(B)
Release(A)

Path P₂ acquires and releases the locks in the inverted sequence:

Acquire(B)
Acquire(A)
Release(A)
Release(B)

Analysis of Path P₁

Starting with path P₁ we define and update the directed graph.

Let L₁ be the ordered active list of locks held by path P₁.

Let G = (V, E) be the directed graph, having the set of vertices V representing observed locks and the set of directed edges between vertices E.

Initial state:

L₁	V	E
()	{}	{}

After P₁ step 1:

L₁	V	E
(A)	{A}	{}

This step adds lock A to the active list and introduces a vertex for the same lock to the directed graph. Since there are no other locks in the active list no edges are added.

After P₁ step 2:

L₁	V	E
(A, B)	{A, B}	{(B, A)}

This step adds lock B to the active list and also introduces a corresponding vertex to the graph. This time the active list does contain a lock, so an edge from the new lock to the existing lock is added to the graph. This edge represents that lock B now depends on lock A preceding it in any other path that involves both locks.

After P₁ step 3:

L₁	V	E
(A)	{A, B}	{(B, A)}

Lock B is removed from the active list. No updates to the graph.

After P₁ step 4:

L₁	V	E
()	{A, B}	{(B, A)}

Lock A is removed from the active list. No updates to the graph.

Analysis of Path P₂

Let L₂ be the active list of locks held by P₂.

Initial state:

L₂	V	E
()	{A, B}	{(B, A)}

In this case the initial state is the final state from path P₁.

After P₂ step 1:

L₂	V	E
(B)	{A, B}	{(B, A)}

This step adds lock B to the active list. As there are no other locks in the active list no edges are added to the graph. Since B already has a vertex in the graph there is also no change to V.

After P₂ step 2:

L₂	V	E
(B, A)	{A, B}	{(B, A), (A, B)}

This step adds lock A to the active list. Since this lock already has a vertex there is no change to V. However, because there is a lock in the active list an edge from the new lock to the existing lock is added to the graph. With this new edge the graph now forms a cycle between vertices A and B, indicating that ordering between these locks is not consistent between the two paths considered thus far and that a potential deadlock exists.

Circular Dependency Example

This section illustrates a directed graph approach to detect a circular dependency inversion using previously discussed example from the invariants section. This illustration is somewhat abbreviated due to the similarity to the previous illustration.

Consider a program with the locks A, B, and C and paths P₁, P₂, and P₃ and the following behavior:

Path P₁ acquires and releases the locks in the sequence:

Acquire(A)
Acquire(B)
Release(B)
Release(A)

Path P₂ acquires and releases the locks in the sequence:

Acquire(B)
Acquire(C)
Release(C)
Release(B)

Path P₃ acquires and releases the locks in the sequence:

Acquire(C)
Acquire(A)
Release(A)
Release(C)

Analysis of Path P₁

Let L₁ be the ordered active list of locks held by path P₁.

Let G = (V, E) be the directed graph, having the set of vertices V representing observed locks and the set of directed edges between vertices E.

Initial state:

L₁	V	E
()	{}	{}

After P₁ step 1:

L₁	V	E
(A)	{A}	{}

After P₁ step 2:

L₁	V	E
(A, B)	{A, B}	{(B, A)}

After P₁ step 3:

L₁	V	E
(A)	{A, B}	{(B, A)}

After P₁ step 4:

L₁	V	E
()	{A, B}	{(B, A)}

Analysis of Path P₂

Let L₂ be the ordered active list of locks held by path P₂.

Initial state:

L₂	V	E
()	{A, B}	{(B, A)}

After P₂ step 1:

L₂	V	E
(B)	{A, B}	{(B, A)}

After P₂ step 2:

L₂	V	E
(B, C)	{A, B, C}	{(B, A), (C, B)}

This step adds lock C to the active list and also introduces a corresponding vertex to the graph. The active list contains the lock B, so an edge is added from C to B.

After P₂ step 3:

L₂	V	E
(B)	{A, B, C}	{(B, A), (C, B)}

After P₂ step 4:

L₂	V	E
()	{A, B, C}	{(B, A), (C, B)}

Analysis of Path P₃

Let L₃ be the ordered active list of locks held by path P₃.

Initial state:

L₃	V	E
()	{A, B, C}	{(B, A), (C, B)}

After P₃ step 1:

L₃	V	E
(C)	{A, B, C}	{(B, A), (C, B)}

After P₃ step 2:

L₃	V	E
(C, A)	{A, B, C}	{(B, A), (C, B), (A, C)}

This step adds lock A to the active list. The active list contains the lock C, so an edge is added from A to C. With this new edge the graph now forms a cycle in the vertices (A, B, C), indicating a circular dependency and the potential for deadlock if paths P₁, P₂, and P₃ are interleaved in the right way.

IRQ-Safe Ordering Example

This section illustrates a directed graph approach to detect irq-safe order violations using the previously discussed example from the invariants section.

Recall the example system with non-irq-safe lock A and irq-safe lock B_irq; paths P₁, P₂, and irq path P_irq; with the following behavior:

Path P₁ acquires and releases the lock in sequence:

Acquire(A)
Release(A)

Path P_irq acquires and releases the lock in sequence:

Acquire(B_irq)
Release(B_irq)

Path P₂ acquires and releases the locks in sequence:

Acquire(B_irq)
Acquire(A)
Release(A)
Release(B_irq)

Analysis of Path P₁

Let L₁ be the ordered active list of locks held by path P₁.

Let G = (V, E) be the directed graph, having the set of vertices V representing observed locks and the set of directed edges between vertices E.

Initial state:

L₁	V	E
()	{}	{}

After P₁ step 1:

L₁	V	E
(A)	{A}	{}

After P₁ step 2:

L₁	V	E
()	{A}	{}

Analysis of Path P_irq

Let L_irq be the ordered active list of locks held by path P_irq.

Initial state:

L_irq	V	E
()	{A}	{}

After P_irq step 1:

L_irq	V	E
(B_irq)	{A, B_irq}	{}

After P_irq step 2:

L_irq	V	E
()	{A, B_irq}	{}

Analysis of Path P_irq

Let L₂ be the ordered active list of locks held by path P₂.

Initial state:

L₂	V	E
()	{A}	{}

After P₂ step 1:

L₂	V	E
(B_irq)	{A, B_irq}	{}

After P₂ step 2:

L₂	V	E
(B_irq, A)	{A, B_irq}	{(A, B_irq)}

This step adds lock A to the active list. The active list contains lock B_irq, so an edge is added from A to B_irq. Because this is an edge from a non-irq-safe lock to an irq-safe lock the irq-safe ordering invariant is violated and a potential deadlock exists.

From Theory to Implementation

This section develops a concrete strategy to implement a directed graph validator, based on the analysis techniques of the previous section.

The implementation strategy has the following goals:

Avoid dynamic allocation if possible.
Minimize the overhead of validation.
Support environments that manage hardware interrupts.

Removing Redundancy with Lock Classes

In the analysis earlier in this document, locks are considered abstractly with the implication that the tracked objects are individual instances of locks. While tracking individual instances produces correct results, it has several consequences that might be avoided:

Tracking structures must be dynamically adjusted as lock instances come into and out of existence, possibly requiring dynamic allocation or other per-instance data storage.
The graph contains redundant information when multiple instances of locks are used identically by the same code paths.
Relatedly, it may take longer to identify violations by locks that serve identical functions, but have not yet individually propagated through all of the necessary code paths.

A key observation is that locks that serve identical functions should follow the same ordering rules, regardless of the number of instances.

Consider the following types with lock members and an operation that mutates both types:

struct Foo {
    Mutex lock;
    int data; GUARDED_BY(lock);
};

struct Bar {
    Mutex lock;
    int data; GUARDED_BY(lock);
};

void Swap(Foo* foo, Bar* bar) {
    foo->lock.Acquire();
    bar->lock.Acquire();

    int temp = foo->data;
    foo->data = bar->data;
    bar->data = temp;

    bar->Release();
    foo->Release();
}

Since operation Swap may operate on any instance of Foo and any instance of Bar it follows that Swap establishes an order between the locks of all instances of Foo and Bar; failure to apply this order consistently in other parts of a program could result in a deadlock when the same instances of Foo and Bar are locked concurrently in different orders.

Note that it is possible to intentionally or unintentionally segregate different collections of Foo and Bar such that instances locked in different orders never overlap. This is still dangerous however, because seemingly innocuous changes to the inputs, structure, or timing of the program could defeat the segregation and introduce a potential deadlock. This problem can be avoided entirely by treating all instances of Foo and Bar equivalently and applying the same ordering rules throughout the program.

Ensuring universal ordering throughout the program can be achieved by tracking classes of locks instead of lock instances: each lock member in each type represents a unique lock class. The relationships between each lock class can be tracked and analyzed using the same directed graph techniques as with individual locks.

Tracking lock classes has the following benefits:

Statically allocated memory: because all lock classes are known at compile time, tracking structures can be allocated up front as static global data.
Elimination of redundant graph nodes: locks in the same class use the same tracking structures.
Faster detection of invariant violations: violations are detected when lock class orders are inconsistent, even if the individual instances involved have never been used together.

Additional Ordering Rules

Tracking lock classes introduces additional ordering considerations when locking multiple locks of the same class. Because individual instances are not tracked it is necessary to take additional steps to ensure consistency when multiple locks of the same class must be acquired at the same time.

Externally Ordered Locks

Nesting locks of the same class is necessary when a hierarchical or other ordered data structure has locks in each node and more than one per-node lock must be held at a time. In this situation the data structure or access pattern must provide a stable ordering that is used to guarantee ordering of the locks.

Validation of nestable lock classes requires only that the external order is recorded in the active locks list for each nestable lock and compared when new locks of the same class are added to the list. A consequence of this design is that other lock classes may not be interspersed between nested locks of the same class, only wholly before or after a collection of nested locks.

For example, non-nestable lock classes A and B, and nestable lock class N may be interspersed like this:

A, N₀, N₁, ... N_n, B

But not like this:

A, N₀, B, N₁, ... N_n or A, N₀, N₁, B, ... N_n or ... etc

In most situations this is a reasonable constraint, as interspersing other locks within a nested structure with arbitrary depth is likely to result in inversions as the structure is updated at runtime. On the other hand, in situations where nesting is bounded to a few levels it may be more effective to define separate lock classes for each level instead of using a nested class -- in this case other locks may be allowed at a specific level following normal lock ordering rules.

Address Ordering

It is difficult to generalize lock ordering between locks of the same class without an externally provided order when the locks are acquired at different times. It is possible however, to provide an ordering guarantee when acquiring multiple locks at the same time, without temporal separation. In this situation the locks may be ordered by address, guaranteeing that any path that acquires the same set locks produces a consistent locking order.

For example, consider an operation F(S_a, S_b) that operates on two instances of structure S, each with a lock of class L and, as part of the operation F must lock both locks.

If instance S0 is ordered in memory before instance S₁ then the locks have the same relative ordering as their containing instances. We can consider the locks to have the subclasses L₀ and L₁ respectively.

A lock ordering problem arises if we perform the operation with different orders:

F(S₀, S₁) and F(S₁, S₀)

Without intervention these produce the inverted lock sequences:

L₀, L₁ and L₁, L₀

Since F has simultaneous access to both locks at the same time, it is possible to order the locks by address, resulting in a consistent lock sequence regardless of the original order of the arguments.

Now suppose we add two more lock classes to the sequence: class A acquired before operation F and class B acquired after operation F. The resulting lock sequence is:

A, L₀, L₁, B

Note that this looks similar to the nested lock class sequence diagram in the previous section. It is in fact the same situation, only the ordering of locks is provided by address rather than an external order. This means that the same bookkeeping in the active threads list can be used for both situations.

Lock Class Tracking Data Structure

This section discusses implementation details for tracking lock classes and concrete processing techniques to detect potential deadlocks.

Each lock class has a statically allocated node in the directed graph representing all locks belonging to that class. Each node has the following data structures:

Lock-Free, Wait-Free Hash Set

Each lock class node has a hash set that tracks the edges from the lock class to the lock classes ordered before it.

TODO: Add implementation details of the hash set.

Lock-Free, Wait-Free Disjoint Set Structures

Each lock class node has a parent pointer used to track nodes that are connected in cycles in the directed graph. This permits reporting cycles that have been previously by the loop detection algorithm without fully re-traversing the graph.

TODO: Add implementation details of the disjoint set structure.

Thread-Local Lock List

Each thread maintains a thread-local list of the locks it currently holds.

TODO: Add implementation details of the thread-local lock list.

Loop Detection Thread

Whenever a new edge is added to the directed graph, the loop detection thread is triggered to traverse the graph to find circular dependencies involving more than two locks. Tarjan's strongly connected sets algorithm is an efficient choice, with worst case complexity of O(|E| + |V|). This algorithm is stable even when traversing a graph that is updated concurrently by other threads.

TODO: Add implementation details of the loop detection thread.

References

Clang static thread safety analysis.
LLVM runtime thread sanitizer.
Linux Kernel lockdep subsystem.

Runtime Lock Validation in Zircon and Fuchsia

Introduction

Background

Static Validation

Dynamic Validation

Locking Ordering Invariant

Basic Inversion

Circular Dependency

IRQ-Safe Ordering

Performing Validation

Basic Inversion Example

Analysis of Path P1

Analysis of Path P2

Circular Dependency Example

Analysis of Path P1

Analysis of Path P2

Analysis of Path P3

IRQ-Safe Ordering Example

Analysis of Path P1

Analysis of Path Pirq

Analysis of Path Pirq

From Theory to Implementation

Removing Redundancy with Lock Classes

Additional Ordering Rules

Externally Ordered Locks

Address Ordering

Lock Class Tracking Data Structure

Lock-Free, Wait-Free Hash Set

Lock-Free, Wait-Free Disjoint Set Structures

Thread-Local Lock List

Loop Detection Thread

References

Analysis of Path P₁

Analysis of Path P₂

Analysis of Path P₁

Analysis of Path P₂

Analysis of Path P₃

Analysis of Path P₁

Analysis of Path P_irq

Analysis of Path P_irq