Build graph convergence is that property that a single build invocation will perform all the actions necessary and in the right order so that every action's outputs are newer than its inputs.
Fuchsia uses the Ninja build system, which is timestamp-driven. Ninja expresses the build as a graph of input/output files and actions that take inputs and produce outputs.
When you run a build, e.g. with
fx build, Ninja will traverse the build graph
and perform any actions whose outputs are not present or whose inputs have
changed since they last run, all in topological order (dependencies before
However, build graph actions are not verified to fulfill the promise that outputs are newer than inputs, which can lead to convergence issues.
Common root causes
There are infinitely many ways to create Ninja convergence issues. That said, prior experience taught us that there are common root causes for these problems.
An output isn't generated
If a build action is declared to produce an output but doesn't actually produce that output (in some circumstances, or ever) then this will cause convergence issues. For instance, an action might declare that it generates a stamp file on success but fail to generate or touch this stamp file, or save it to the wrong location.
An output is stale (not newer than all inputs)
Ninja knows that an output is fresh if it's newer than all inputs. If one or more inputs have changed since the output was saved, then Ninja will repeat the step(s) necessary to generate the output.
However if the action that generates the output doesn't update the output when inputs have changed, this creates the appearance of a perpetual state of staleness.
A common mistake that causes this is when actions review their inputs, decide they have nothing to do/change with the contents of their outputs, but fail to update the modification timestamp on their outputs (i.e. "touch" or "stamp" their outputs).
It is possible for an action to modify its inputs. Typically inputs to an action should be opened with read access only, however it's not out of the question to write to them. That said, if your action needs to modify an input, it should do so before writing any outputs. Or if you must modify inputs after writing outputs, be sure to update the timestamp on your outputs before exiting the action. Otherwise you will have updated one or more of your inputs to be newer than one or more of your outputs, and thus confused Ninja into thinking that your outputs are stale.
Modifying inputs in actions can also introduce race conditions that make reproducing the problem non-deterministic. If multiple actions depend on the same input and one of them modifies the input, then the build will fail to converge if one of those actions results in an input timestamp that is newer than any of the actions' outputs. In depedency-ordered execution, the relative ordering of independent actions cannot be guaranteed.
Avoid modifying inputs.
Symbolic links and hard links
The Ninja build system follows symbolic links to determine timestamps. This can
have surprising consequences when soft symlinks participate in Ninja rules as
input dependencies or outputs. The timestamps of symlinks themselves (as opposed
to their destinations) are not considered for staleness and freshness. See
ninja#1186 for an explanation in terms of
lstat() and a
demonstration. Hard links (
-s), have the problem where multiple
references point to the same filesystem object, and hence, have the same
Even a simple link action can cause issues. Consider a simple action with the
src, the output
$target_out_dir/dst, and the invoked action being
src $target_out_dir/dst. At face value this action converges correctly. But the
action() may be overridden elsewhere in the build system, such as
to wrap actions with other actions. As a result your inner action may not
converge to no-op when
src has an older timestamp than the wrapper action's
script, which will then in turn be considered older than
dst (its output,
which carries the timestamp of the input). copy() doesn't suffer from the same
problem because it's never wrapped.
Avoid symlinks and hard-links in action inputs and outputs.
For making copies, prefer the built-in copy() target.
Modern filesystems store timestamps on files (such as the time of last modification) in nanosecond resolution. Some older runtimes, such as Python 2.7, persist file timestamps in lower resolution, for instance milliseconds. It is therefore possible for an action to read an input and write an output with a timestamp that it considers to be "now" but is actually older than the timestamp of the input, if for instance the input and output were both written at the same millisecond and the output's timestamp is truncated after the millisecond digits.
At the time of this writing we have mechanisms in place to ensure that all Python actions in the build run with Python 3.x, in part to avoid this problem.
Build convergence diagnostics
We have the following tools to diagnose build convergence issues:
- Ninja no-op check in the Commit Queue
- Filesystem access action tracing
Ninja no-op check
Fuchsia's Commit Queue (CQ) verifies that changes not only build successfully, but also keep the build system in a state that it converges to no-op in a single build invocation.
Example of a build convergence error from CQ:
fuchsia confirm no-op ninja build does not converge to a no-op
The same build is run in CQ before changes can be merged into the source tree,
to ensure that changes don't break the build. After completing a build
successfully, CQ will invoke Ninja again and expect Ninja to report
"no work to
do". This serves as a soundness check, since a correct build graph is expected
to "converge" to no-op.
If this soundness check fails then CQ will report a failure on a step named
fuchsia confirm no-op.
Reproducing Ninja convergence issues
With a source tree synced to your change, simply try the following:
This command should print:
ninja: no work to do.
If this is not the case, and actual build actions are being performed, run the same command again. If the second invocation still didn't produce "no work", then you've reproduced the issue. If you've arrived at "no work" still, try the following:
# Clean your build cache rm -rf out # Set up the build specification again fx set ... # Build fx build # Build again, expecting no-op fx build
Troubleshooting Ninja convergence issues
In the CQ results page, under the failed step
confirm no-op, you will see
- execution details
- ninja -d explain -n -v
- dirty paths
The link to
ninja -d explain -n -v shows information that you should be able
to reproduce locally with the following command:
fx ninja -C $(fx get-build-dir) -d explain -n -v
This link to "dirty paths" shows the most relevant subset of the same information. You will see a text file that will most likely begin as follows:
ninja explain: output <...> doesn't exist ...
Every line in this file is like a domino brick. You should begin troubleshooting the problem by looking at the first domino brick that started the chain reaction of extra work being done. For instance in the example above a particular output file doesn't exist, which causes Ninja to re-run the build action that's supposed to produce this output, and then subsequently rerun dependent actions.
Filesystem access tracing
There are also builders that trace actions' file system accesses. Diagnostics for stale or missing outputs look like:
Not all outputs of //your:label were written or touched, which can cause subsequent build invocations to re-execute actions due to a missing file or old timestamp. Required writes: ... Missing outputs: ... Stale outputs: ...
Diagnostics for unallowed writes to inputs look like:
Unexpected file accesses building //your/target:label, following the order they are accessed: (FileAccessType.WRITE /path/to/input-that-should-not-be-touched.txt)
Compared to the Ninja no-op check, this check is done on every individual action
and diagnoses one of the many causes of convergence issues immediately as the
action happens, rather than later through a full
fx build command. This
approach can catch some issues that would otherwise be hard to reproduce due to
Troubleshooting traced action failures
To locally enable action tracing, do one of the following:
fx set ... --args=build_should_trace_actions=true
fx args, add
build_should_trace_actions=truein the editor, save-and-exit
fx build //your/failing:target.
Examine the files in the message in the context of the action's script or command, and see if they fall in one of the categories of common issues.