This document explains how to write a stress test using the rust stress test
library. The library can be found at //src/sys/lib/stress-test
. It implements the test loop and
the concurrency and synchronization primitives required for running these tests.
Writing a stress test
Define the GN build targets
Define rustc_binary
, fuchsia_component
and fuchsia_test_package
GN build targets
for your test:
rustc_binary("filesystem-stressor-bin") {
deps = [
...
"//src/sys/lib/stress-test",
...
]
sources = [
...
]
}
fuchsia_component("filesystem-stressor") {
deps = [ ":filesystem-stressor-bin" ]
manifest = "meta/filesystem-stressor.cml"
testonly = true
}
fuchsia_test_package("filesystem-stress-tests") {
test_components = [
":filesystem-stressor"
]
}
Write an actor
Every actor must implement the Actor
trait. The actor trait is one method perform()
that is
invoked by an ActorRunner
. When invoked, the actor must perform exactly one operation and return
its result to the runner. An actor must stores all the connections necessary to perform operations.
pub trait Actor: Sync + Send + 'static {
// ActorRunner invokes this function, instructing the actor
// to perform exactly one operation and return result.
async fn perform(&mut self) -> Result<(), ActorError>;
}
An actor can indicate the following with the return result:
Ok(())
: Operation succeeded and is added to the global operation count.Err(ActorError::DoNotCount)
: The operation must not be counted towards the global operation count.Err(ActorError::ResetEnvironment)
: The environment must be reset and the operation must not be counted towards the global operation count.
When an actor encounters an unexpected error, it should panic, thus stopping the test.
Since actors are operating on the same environment, it is possible that their operations will collide. For example, for a filesystem stress test, actors may operate on the same set of files. If such collisions are desirable, you must setup actors to handle such collisions gracefully. If not, the actor should panic, causing the test to stop.
An actor can intentionally break the system-under-test, requiring the environment to be reset. For
example, for a filesystem stress test, an actor can randomly sever the connection between the
filesystem and the underlying block device. In this example, other actors should request a new
environment with ActorError::ResetEnvironment
, and the environment will re-establish connections
for all of the actors.
pub struct FilesystemActor {
/// Store a connection to the root of filesystem here
pub root_directory: Directory
...
}
impl FilesystemActor {
pub fn new(root_directory: Directory) -> Self {
...
}
}
#[async_trait]
impl Actor for FilesystemActor {
async pub fn perform(&mut self) -> Result<(), ActorError> {
// Choose exactly one operation to do on the filesystem
// using the root_directory
self.root_directory.delete_all_files();
}
}
Write an Environment
The environment provides the basic configuration for the stress test - the exit criteria, the actors and a reset method.
pub trait Environment: Send + Sync + Debug {
/// Returns the target number of operations to complete before exiting
fn target_operations(&self) -> Option<u64>;
/// Returns the number of seconds to wait before exiting
fn timeout_seconds(&self) -> Option<u64>;
/// Return the runners for all the actors
async fn actor_runners(&mut self) -> Vec<ActorRunner>;
/// Reset the environment, when an actor requests one
async fn reset(&mut self);
}
An environment can store additional configuration for the test. You can provide this configuration
through the command line with the argh
crate.
An actor is shared between a runner and the environment and hence it must be wrapped as an
Arc<Mutex<dyn Actor>>
. Runners hold the lock while an actor is performing an operation.
This means that the environment can only acquire an actor's lock between operations.
An environment is instructed to reset when an actor determines that the current instance of the system-under-test has been broken. The environment is expected to create a new instance for the system-under-test and lock on the actors to update their connections to the new instance.
The environment must also implement the Debug
trait. Stress tests log the environment
when the test starts and if the test panics. It is common practice to print out parameters that are
valuable for reproducing the test, such as the random seed used.
#[derive(Debug)]
pub struct FilesystemEnvironment {
fs_actor: Arc<Mutex<FilesystemActor>>,
seed: u64,
...
}
impl Environment {
pub fn new() -> Self {
...
}
}
#[async_trait]
impl Environment for FilesystemEnvironment {
fn target_operations(&self) -> Option<u64> {
// By specifying None here, the test will run without an operation limit
None
}
fn timeout_seconds(&self) -> Option<u64> {
// By specifying None here, the test will run without a time limit
None
}
async fn actor_runners(&mut self) -> Vec<ActorRunner> {
vec![
ActorRunner::new(
"filesystem_actor", // debug name
60, // delay (in seconds) between each operation (0 means no delay)
self.fs_actor.clone()), // actor
)
]
}
async fn reset(&mut self) {
// If the actor is performing an operation, this will remain
// locked until the operation is complete.
let actor = self.fs_actor.lock().await;
// Now the environment can update the actor before it is run again.
actor.root_directory = ...;
// Releasing the lock will resume the runner.
}
}
Write the main function
The main function of a stress test is straightforward, since most of the logic is implemented in the Environment and Actors. Use the main function to collect command-line arguments (if any), initialize logging and set log severity.
#[fuchsia::main]
async fn main() {
// Create the environment
let env = FilesystemEnvironment::new();
// Run the test.
// Depending on the exit criteria, this may never return.
stress_test::run_test(env).await;
}
Running stress tests locally
Since a stress test is a part of a fuchsia_test_package
, one of the easiest ways to run it
is with the fx test
command:
fx test filesystem-stress-tests
To run the test with custom command line arguments, use fx shell run
:
fx shell run fuchsia-pkg://fuchsia.com/filesystem-stress-tests#meta/filesystem-stressor.cm <args>
Running stress tests on infrastructure
A stress test is identified by infrastructure through the stress-tests
tag that is attached to
the fuchsia_test_package
or fuchsia_unittest_package
GN Build Target.
fuchsia_test_package("filesystem-stress-tests") {
test_components = [
":filesystem-stressor"
]
test_specs = {
environments = [
{
dimensions = {
device_type = "QEMU"
}
tags = [ "stress-tests" ]
},
]
}
}
A dedicated core.x64-stress
builder identifies these tests and runs each test component in the
package for a maximum of 22 hours.
Debugging a stress test
The framework uses the rust log
crate to log messages. The test logs the environment object at
start and if the test panics.
--------------------- stressor is starting -----------------------
Environment {
seed: 268479717856254664270968796173957499835,
filesystem_actor: { ... }
...
}
------------------------------------------------------------------
If debug logging is enabled, individual actor operations and operation counts are also logged.
DEBUG: [0][filesystem_actor][389] Sleeping for 2 seconds
DEBUG: [0][filesystem_actor][389] Performing...
DEBUG: [0][filesystem_actor][389] Done!
DEBUG: Counters -> [total:403] {"filesystem_actor": 403}