Desirable properties of tests
The properties below are generally good to have, provided that they don't conflict with other goals of the test.
- Isolated: tests should be isolated from code, systems, and details that are outside the scope of the tests. Different test cases should be isolated from each other. On Fuchsia we use the isolation guarantees given by the Component Framework to isolate tests. A useful outcome of isolated tests is that tests can be run in parallel or in a different order and their result is the same.
- Hermetic: the result of a test is defined by the contents of the test. On Fuchsia, a unit test or an integration test is hermetic if its result is defined by the contents of the test’s package. If a test’s package hasn’t changed then it can be assumed that its behavior hasn’t changed, and it’s still passing or still failing. This property can be used to select which tests to run in order to validate a given change. In the absence of hermeticity guarantees, the next best alternative is to run all the tests, which is costly.
- Reproducible: re-running the same test should produce the same result. Isolation and hermeticity improve reproducibility. The larger the scope of the test, the more difficult it is for the test to be reproducible.
- Proximity to the code under test: tests should focus on a particular unit and test, and control for what’s not under test.
- Resilient: tests shouldn’t need to change when the code under test changes, unless the change is important to the code’s purpose. Tests that continue to work after benign changes to the code are more resilient. Tests that exercise the code’s public APIs or other forms of contracts tend to be more resilient. This also happens when you focus on testing behavior, not implementation.
- Easy to troubleshoot: when tests fail, they should produce clear actionable errors that identify the defect. Tests that isolate errors, or otherwise have a smaller scope, are usually easier to troubleshoot when they fail.
- Fast: tests are often part of the developer feedback loop. A faster feedback loop is a better feedback loop that makes developers more productive.
- Reliable: test failure should indicate a real defect with the code under test. Reliable tests give more confidence when they pass and produce fewer false failures that are costly to maintain.
- Flexible: the fewer constraints there are on running tests, the easier it is to run them. On Fuchsia we particularly appreciate if tests can run on emulators, when possible.
Undesirable properties of tests
The properties below are generally bad to have. Depending on circumstances they may be the downsides of a tradeoff that was made for the purpose of the test, which as a whole is a net positive.
- Flaky: tests that produce false failures, then pass when they’re retried, are flaky. Flaky tests are costlier to maintain, slower to run due to retries, and provide lower confidence results.
- Slow: tests that take longer to run create less efficient feedback loops and make developers less productive. The bigger the scope of the test, the slower it is usually to run.
- Difficult to troubleshoot: tests that fail with errors that are not immediately actionable or otherwise don’t indicate the root cause of the failure are more difficult to troubleshoot. Developers have to look elsewhere other than the test failure itself, such as at system logs or internal system state, to troubleshoot the test failure.
- Change detectors: tests that are coupled too closely with implementation details that aren’t important for functionality will often fail when the code under test changes in ways that are benign to the external observer. Change detector tests are more costly to maintain.
Test against interfaces and contracts
Recommended: Test using public APIs and other interfaces and contracts offered to the client of the code under test. These tests are more resilient to benign changes.
Not recommended: Don’t test implementation details that are not important to the client. Such tests often break when the code under test changes in benign ways.
- Change-Detector Tests Considered Harmful
- Prefer Testing Public APIs Over Implementation-Detail Classes
- Test Behavior, Not Implementation
- Testing State vs. Testing Interactions
Write readable test code
Consider readability as you write tests, the same as you do when you write the code under test.
- A test is complete if the body of the test contains all the information you need to know in order to understand it.
- A test is concise if the test doesn’t contain any other distracting information.
Recommended: Write test cases that are complete and concise. Prefer writing more test individual test cases, each with a narrow focus on specific circumstances and concerns.
Not recommended: Don’t combine multiple scenarios into fewer test cases in order to produce shorter tests with fewer test cases.
Write reproducible, deterministic tests
Tests should be deterministic, meaning every run of the test against the same revision of code produces the same result. If not, the test may become costly to maintain.
Threaded or time-dependent code, random number generators (RNGs), and cross-component communication are common sources of nondeterminism.
Recommended: Use these tips to write deterministic tests:
- For time-dependent tests, use fake or mocked clocks to provide
- Threaded code must always use the proper synchronization primitives to avoid flakes. Whenever possible, prefer single-threaded tests.
- Always provide a mechanism to inject seeds for RNGs and use them in tests.
- Use mocks in component integration tests. See Realm Builder.
- When working with tests that are sensitive to flaky behavior,
consider running tests multiple times to ensure that they consistently pass.
You can use repeat flags, such as
--gtest_repeatin GoogleTest and
--test.countin Go, to do this. Aim for at least 100-1000 runs locally if your test is prone to flakes before merging.
Not recommended: Never use
sleep in tests
as a means of weak synchronization. You may use short sleeps when polling in a
loop, between loop iterations.
Test doubles: stubs, mocks, fakes
Test doubles stand in for a real dependency of the code under test during a test.
- A stub is a test double that returns a given value and contains no logic.
- A mock is a test double that has expectations about how it should be called. Mocks are useful for testing interactions.
- A fake is a lightweight implementation of the real object.
Recommended: Create fakes for code that you own so that your clients can use that as a test double in their own tests. For integration testing, consider making it possible to run an instance of your real component in a test realm in isolation from the rest of the system, and document this behavior.
Not recommended: Don’t overuse mocks in your tests, as you might create lower-quality tests that are less readable and more costly to maintain while providing less confidence when they pass. Avoid mocking dependencies that you don’t own.
Use end-to-end tests appropriately
Recommended: Use end-to-end tests to test critical user journeys. Such tests should exercise the journey as a user, for instance by automating user interactions and examining user interface state changes.
Not recommended: Don’t use end-to-end tests to cover for missing tests at other layers or smaller scopes, since when those end-to-end tests catch errors they will be very difficult to troubleshoot.
Recommended: Use end-to-end tests sparingly, as part of a balanced testing strategy that leans more heavily on smaller-scoped tests that run quickly and produce precise and actionable results.
Not recommended: Don’t rely on end-to-end tests in your development feedback cycle, because they typically take a long time to run and often produce more flaky results than smaller-scoped tests.
- Testing UI Logic? Follow the User!
- Just Say No to More End-to-End Tests
- Test Flakiness - One of the main challenges of automated testing
- Test Flakiness - One of the main challenges of automated testing (Part II)
- Avoiding Flakey Tests
- Where do our flaky tests come from?