Google is committed to advancing racial equity for Black communities. See how.

Triage codelab

Contributors: cphoenix@

This codelab explains the Triage utility:

  • What it's for.
  • How to run it, including command line options.
  • How to add and test configuration rules to detect problems in Fuchsia bugreports.

The source files and examples to which this document refers are available at:

What is Triage?

Triage allows you to scan bug dump files (bugreport.zip, fuchsia_feedback_data.zip) for predefined conditions.

The Triage system makes it easy to configure new conditions, increasing the usefulness of Triage for everyone.

Prerequisites

Before you start on this codelab, make sure you have completed the following:

Running Triage from the command line

  • To run Triage:
fx triage

This command downloads a fresh bugreport.zip file using the fx bugreport command. This command runs the default rules which are located in //src/diagnostics/config/triage/*.triage.

To analyze a specific bugreport.zip or fuchsia_feedback_data.zip file, use --data.

  • You can specify at most one --data argument.
  • The argument to --data can also be a path to a directory containing an unzipped bugreport.
  • If you run fx triage without specifying a --data option, it runs a fresh fx bugreport and analyzes its inspect.json file.
$ fx triage --data my/foo/bugreport.zip

To use a specific configuration file or all *.triage files in a specific directory, use --config.

  • You can use multiple --config arguments.
  • If a --config argument is used, the default rules will not be automatically loaded.
fx triage --config my/directory --config my/file.triage

Adding Triage rules

The rest of this codelab explains how to configure new behavior in Triage.

Overview

Triage condenses the mass of Inspect data into useful information, through the following steps:

  1. Select values from the inspect.json file using selector strings in the select section of the config file.
  2. Perform computations and comparisons to generate new values, as specified in the eval section of the config file.
  3. Take actions depending on Boolean values, according to entries in the act section of the config file.
  4. Support unit-testing the actions via entries in the test section.

Find the codelab's sample files

Navigate to the src/diagnostics/examples/triage directory in your source tree. The following command is intended to run from that directory:

fx triage --config . --data bugreport

Running this command in the sample directory with the unmodified codelab files prints a line indicating that Triage is working properly:

Warning: 'always_triggered' in 'rules' detected 'Triage is running': 'always_true' was true

inspect.json

This codelab includes an inspect.json file with Inspect data to make the exercises work predictably. This file is in the sample directory under bugreport/inspect.json.

rules.triage

The Triage program uses configuration loaded from one or more .triage files. This codelab uses the rules.triage file located in the sample directory.

Add selectors for the Inspect values

The inspect.json file in the sample directory indicates a couple of problems with the system. You're going to configure the triage system to detect those problems.

This step configures Triage to extract values from the data in the inspect.json file.

The rules.triage file contains a key-value section called select. Each key (name) can be used in the body of other config entries. Each value is a selector string. In effect, each entry in the select section (and the eval section, described below) defines a variable.

The selector string is a colon-separated string that tells where in the Inspect data to find the value you need.

select: {
    // "global_dat" is an intentional typo to fix later in the codelab.
    disk_used: "INSPECT:global_dat/storage:root/stats:used_bytes",
}

Inspect data published by a component is organized as a tree of nodes with values (properties) at the leaves. The inspect.json file is an array of these trees, each with a moniker that identifies the source component.

The portion of the selector string between the INSPECT: and the second colon should match one of the moniker strings in the inspect.json file.

The portion between the second and third colons is a /-separated list of node names.

The portion after the last colon is the property name.

The above selector string indicates a component whose moniker contains the string global_dat/storage. It also indicates the used_bytes property from the stats subnode of the root node of that component's Inspect Tree.

  1. Copy the above "disk_used" selector, and add it to the "select" section of the rules.triage file.

  2. Write and add another selector named "disk_total" to select the "total_bytes" property at the same node in the Inspect data.

.triage files use JSON5 which is easier to write and read than JSON:

  • It's good style to put a comma after the last list item.
  • Most keys (including all valid Triage names) don't need to be wrapped in quotation marks.
  • /* Multiline */ and // single comments can be used.

Add a computation

In addition to selecting values from the inspect.json file, you need to do some logic, and probably some arithmetic, to see whether those values indicate a condition worth flagging.

Copy and add the following line to the eval section of the rules.triage file to calculate how full the disk is:

eval: {
    ....
    disk_percentage: "disk_used / disk_total",
}

eval entries use ordinary infix math expressions. See the Details section for more information.

Add an action

In the "act" part of the config file, add an action which prints a warning when the disk is 98% full. Use the following lines:

act: {
    ...
    "disk_full": {
        "trigger": "disk_percentage > 0.98",
        "print": "Disk is over 98% full",
    },
}

Note the following:

  • The "trigger" is an expression that evaluates to a Boolean value. This may be the name of a Boolean-type selector or computation, or any suitable math expression.
  • See the Details section for more information about comparisons.
  • Currently, print is the only available action.

Try it out

The following command will run Triage against the local config file.

fx triage --config . --data bugreport

You will get an error that looks like the following:

[ERROR] In config 'rules': No value found matching selector global_dat/storage:root/stats:used_bytes

There was a typo in the selector rules. Triage could not find values needed to evaluate a rule. In fact, the correct selector is "global_data" not "global_dat." Fix it in your selector rules and try again.

fx triage --config . --data bugreport

Now what happened? Nothing, right? So, how do you know whether there was no problem in the inspect.json file, or a bug in your rule?

Test your rule

You can (and should!) add tests for your actions. For each test, write a snippet of Inspect-format content and specify whether it should or should not trigger your rule.

To test the rule you've added, add the following to the test section of the rules.triage file:

test: {
    ....
    is_full: {
        yes: ["disk_full"],
        no: [],
        inspect: [
            {
                moniker: "global_data/storage",
                payload: {root: {stats: {
                    total_bytes: 100, used_bytes: 98}}}
            }
        ]
    }
}

You can also test conditions in which actions should not trigger:

test: {
    ....
    not_full: {
        yes: [],
        no: ["disk_full"],
        inspect: [
            {
                moniker: "global_data/storage",
                payload: {root: {stats: {
                    total_bytes: 100, used_bytes: 97}}}
            }
        ]
    }
}

To run the test, just run Triage. It automatically self-tests each time it's run.

fx triage --config . --data bugreport

Whoops! That should signal an error:

Test is_full failed: trigger disk98 of action disk_full returned Bool(false), expected true

Fix your rule

You want to trigger when the disk is 98% or more full, but that's not quite what you wrote, and your test caught the problem. Modify the > in your action to be a >=:

        "trigger": "disk_percentage >= 0.98",

Run Triage again. The error should disappear, replaced by a warning that your inspect.json file does in fact indicate a full disk.

Warning: 'disk_full' in 'rules' detected 'Disk is 98% full': 'disk98' was true

Use multiple configuration files

You can add any number of Triage configuration files, and even use variables defined in one file in another file. This has lots of applications:

  • One file for disk-related variables and actions, and another for network-related variables and actions.
  • A file to define product-specific numbers.
  • Separate files for particular engineers or teams.

Add a file "product.triage" containing the following:

{
    eval: {
        max_widgets: "4",
    },
}

Note the following:

  • Empty sections may be omitted from .triage files. This file contains no select, act, or test entries.
  • Although numeric values in JSON are not quoted, this is an expression string so it does need to be quoted.

Add the following entries to the rules.triage file:

select: {
    ...
    actual_widgets: "widget_maker.cmx:root:total_widgets",
}

That will extract how many widgets were active in the device.

eval: {
    ...
    too_many_widgets: "actual_widgets > product::max_widgets",

That compares the actual widgets with the theoretical maximum for the product.

Finally, add an action:

act: {
    ...
    widget_overflow: {
        trigger: "too_many_widgets",
        print: "Too many widgets!",
    },
}

Unfortunately, this device tried to use 6 widgets, so this warning should trigger when "fx triage" is run.

In a production environment, several "product.triage" files could be maintained in different directories, and Triage could be directed to use any of them with the "--config" command line argument. (Future versions of Triage may be able to select the correct product file automatically.)

Details

Names

Names (of selectors, expressions, actions, and tests, as well as the basenames of config files) can be any letter or underscore, followed by any number of letters, numbers, or underscores.

Names beginning with underscores may have special meaning in future versions of Triage. They're not forbidden, but it's best to avoid them.

The name of each .triage file establishes its namespace. Loading two .triage files with the same name from different directories is not allowed.

Math expressions

  • Variables can be 64-bit float, signed 64-bit int, or Boolean.
  • Arithmetic expressions use + - * / // operators, with ordinary order and precedence of operations.
  • The division operator / produces a float value.
  • The division operator // produces an int value, truncating the result toward 0, even with float arguments. (Note this is different from Python 3 where // truncates downward.)
  • + - * preserve the type of their operands (mixed promotes to float).
  • Comparison operators are > >= < <= == !=
  • Comparisons have Boolean result type and can be used to trigger actions.
  • You can combine computations and comparisons in a single eval rule.
  • You can use parentheses.
  • You can use the key names of eval and select entries as variables.
  • Spaces are optional everywhere, and allowed everywhere except inside filename::variable namespaced variables.

Predefined functions

Triage provides predefined functions for use in eval expressions:

  • Max(value1, value2, value3...) returns the largest value, with type promotion to float.
  • Min(value1, value2, value3...) returns the smallest value, with type promotion to float.
  • And(value1, value2, value3...) takes Boolean arguments and returns the logical AND of the values.
  • Or(value1, value2, value3...) takes Boolean arguments and returns the logical OR of the values.
  • Not(value) takes one Boolean argument and returns the logical NOT of it.

Further Reading

See fx triage for the latest features and options - Triage will keep improving!