|RFC-0163: Test Output Format|
Defines, but does not yet use, an initial stable version of the test output format produced by ffx test.
|Date submitted (year-month-day)||2022-02-22|
|Date reviewed (year-month-day)||2022-05-18|
Former API Design Document
This RFC was previously submitted as an API design document and converted to an RFC afterwards when the API design doc template was deprecated.
This document proposes a stable, directory based format output by testing tools provided through the fuchsia SDK.
Goals and use cases
The Test Architecture allows clients to schedule
and execute multiple tests at once, and will also collect a large set of
diagnostic artifacts for each test.
ffx test serves as the host tool through
which clients may interact with Test Architecture.
Automated tools that invoke
ffx test, such as tools that run in CI
infrastructure, need a way to reliably obtain the complete set of results and
artifacts produced during test execution. Today, these tools obtain test
results by parsing stdout, which is brittle and cannot express the full set of
artifacts collected by Test Architecture. This design is intended to provide a
stable output format optimized for parsing by machines.
A stable format on disk also enables a developer to inspect results after a test has run, or share results with other developers.
ffx test is intended for use by SDK consumers out-of-tree. To ensure these
users' tools are not broken by updates to the tool, the format must have
This output format is not intended to replace all signals of test success
or failure. For example,
ffx test will still support human readable output as
well as return codes that indicate success or failure.
A single execution of
ffx test produces a test run. A test run is
comprised of suites, and each suite is comprised of test cases.
A suite run is an execution of a test component. The test component is most
commonly identified by its URL, for example,
The same suite may be run multiple times in a test run. In this case there
are multiple separate suite runs.
A test case is an execution of a single test case contained in a suite. The same test case may be run multiple times within a suite run. In this case there are multiple test cases.
An artifact is diagnostic output collected by the Test Architecture. An artifact is scoped to either the test run, a test suite run, or a test case. For example, Test Architecture today collects stdout and stderr per test case, but collects syslog per test suite run.
Test output is stored as a directory. The root of the directory contains a single JSON file and any number of subdirectories. A subdirectory will contain artifacts scoped to a single test case, test suite, or the test run. The JSON file contains test results, details of test execution, and lists the artifacts contained in the directory.
The JSON file is always called
run_summary.json and is always located in the
top level of the directory.
run_summary.json contains the complete set of
results for the overall test run, suite runs within the test run, and test
cases within each suite run. It also contains the name of each artifact
subdirectory and a list of artifacts within it.
The names of subdirectories and artifacts within the subdirectories is
unspecified. The actual names of subdirectories and artifacts are defined in
run_summary.json. Artifacts are always located in the top level of an artifact
JSON schemata will be placed under
exported through the SDK. Schema evolution will rely on the schema versioning
mechanisms introduced in RFC-0100.
The initial version of the schema is defined in this commit.
Consumption By Tools
Tools that need to interpret test results should begin by parsing
run_summary.json, which is the only file in the output format with a defined
run_summary.json contains the complete set of results and
references to all artifacts. Tools should not assume that any other locations
in the output are stable.
The output format might be updated based on user needs. For example, we may update the format to simplify parsing in common use cases we discover.
Extensibility and Evolution
The primary extensions we anticipate are addition of new test statuses and artifact types. In general, additional enum variants and JSON fields to the schemata are not breaking changes, and tools consuming the output can safely ignore fields and enum variants they do not understand. Breaking changes include modifications to required fields, and changes to the structure of the directory. In these cases, a new version of the JSON schema will be produced and published in the SDK following the strategy in RFC-0100.
Outside of Fuchsia, there are a number of individual test frameworks which support a number of machine readable test result formats. For example, googletest supports both a JSON and XML output format.
Testing will primarily rely on unit and integration tests that verify that the
output produced by
ffx test conforms to the output format.
Generation of this file format is not expected to take a significant amount of
time. A prototype implementation to save a suite with 100,000 test cases, each
with a stdout and stderr artifacts, takes under a second to generate and
run_summary.json, which is around 26 MB on disk. Given 1,000,000
cases, the same prototype takes around 15 seconds to generate
run_summary.json, which is around 256 MB on disk. In all cases the
prototype's maximum memory usage is roughly double the size of the summary on
For comparison, a large known test suite in chromium contains around 100,000
cases. Since saving
run_summary.json for this number of cases takes under a
second, this should not pose an issue.
In fuchsia.git, the largest test suites today contain around 300 - 400 test
cases, and a single infra shard may contain around 300 suites. For
fuchsia.git, we intend to use ffx test's multitest feature, which will save
results for all test cases in a single output directory. So for this case, we
should expect up to around 120,000 cases in
run_summary.json, which we
estimate to take less than 50 MB on disk and around a second to save.
There are a number of common local development flows where a developer runs a test repeatedly. For example, a developer may run a test repeatedly to reproduce a flaky failure. In such cases, we can easily run batches of several thousand test cases without exceedimg 10 MB of memory usage.
This output is used only to store results and artifacts produced by tests and are already made available by the test framework, and does not introduce additional security considerations.
This output is used only to store results and artifacts produced by tests and are already made available by the test framework, and does not introduce additional privacy considerations.
Drawbacks and alternatives
The output format supports multiple test suite runs by default. This complicates parsing for clients that will only ever run one test at a time. One alternative is to support an output format that contains results for a single test suite run, and a second output format which contains results for multiple test suite runs. While this simplifies immediate parsing, having multiple formats complicates sharing tools that analyze the output.
Another alternative is to use multiple JSON files to store test results. One
disadvantage of using a single file to store all test results is that tools
that interact with it must hold the entire contents in memory at once. From the
prototype, we would need around 4 million test cases before
exceeds 1GB, at which point serializing would take around a minute. As our
current use cases are an order of magnitude less than this, multiple files are
not yet necessary. Since storing results in multiple files complicates parsing
we will use a single file.
A third alternative is to store all artifacts in
run_summary.json to further
simplify parsing. This has the disadvantage that the artifacts also now need to
be held in memory. The largest artifacts collected today are profile data used
for coverage. In a single infra shard, this profile data is around 500 MB split
between multiple files. While this profile does not exceed 1GB, a combination
of changes to how tests are sharded and how profile data is represented could
quickly bring the size closer to 1GB.