RFC-0158: Structured Config Accessors

RFC-0158: Structured Config Accessors
StatusAccepted
Areas
  • Component Framework
Description

High-level philosophy and requirements for user-facing config accessors.

Issues
Gerrit change
Authors
Reviewers
Date submitted (year-month-day)2021-12-20
Date reviewed (year-month-day)2022-05-04

Summary

Requirements, design philosophy, and high-level implementation details for user-facing generated accessor libraries implementing RFC-0127 for Structured Configuration.

Motivation

This RFC builds on the non-normative examples of user-facing APIs in the RFC for Structured Configuration, outlining design principles and constraints for the generated libraries developers will use to access their configuration.

Stakeholders

Facilitator: hjfreyer@google.com

Reviewers:

  • geb@google.com (Component Framework)
  • jsankey@google.com (RFC-0127 author)
  • yifeit@google.com (FIDL)

Consulted: xbhatnag@google.com, hjfreyer@google.com, jamesr@google.com

Socialization: This RFC is the product of discussions between members of the Component Framework team and FIDL teams, answering questions that have arisen while prototyping the initial phases of RFC-0127.

Design

RFC-0127 sketches a user-facing interface for consuming configuration values in code. It provides a statically-typed generated library specific to a component's configuration, and it has a single entry point which is guaranteed to return a valid configuration.

Objectives

We have several objectives for accessors: they should convey infallibility, be opaque with respect to overrides, be debuggable, and offer type interfaces which are familiar to Fuchsia developers, reusing existing tooling where possible.

Infallible

A component author must be able to assume the values are always available, with all fields always populated. Configuration fields' nullability should match the input declaration (at time of writing nullable fields are not supported in config schemas) and the accessor function should be infallible, i.e. not returning an error or throwing an exception. Accessors should fail fast and terminate component execution (e.g. abort()) if they receive a configuration payload which they fail to parse.

Opaque

Per RFC-0127, the Component Framework will eventually provide APIs for overriding a component's configuration values at runtime. From an author's view, the parsed type does not describe any overrides which may have occurred. The generated libraries are not responsible for processing overrides.

Debuggable

Accessors should provide an option to output the component's configuration as an Inspect hierarchy. This will allow config to be included in crash reports with a minimal addition of code, without forcing the Component Framework to store all config values in its memory, duplicating a component's own copies.

An alternative to generating Inspect output would be to expose config in Component Manager's Inspect.

Ergonomic

Generated libraries should have a single top-level type containing all of a component's configuration fields. It should be returned by a single top-level accessor function.

Users should only interact with the namespace or library name of a single generated library, which they can control.

Familiar

Types generated for configuration schemas should feel familiar to users of FIDL bindings.

Syntax

Consider this config stanza using syntax from RFC-0146:

config: {
    check_interval_ns: { type: "int64" },
    data_path: {
        type: "string",
        max_size: 256,
    },
    test_only: { type: "bool" },
},

Build templates

In fuchsia.git, accessors must be defined in the same build file as the component which defines the configuration keys. For example, with a Rust binary:

import("//build/components.gni")
import("//build/rust/rustc_binary.gni")

fuchsia_component("my_component") {
  manifest = "meta/my_component.cml"
  deps = [ ":my_bin" ]
}

fuchsia_structured_config_rust_lib("my_component_config") {
  library_name = "my_component"

  # NOTE: This target internally depends on a target which is generated by the
  # fuchsia_component() template, so the graph is not cyclic:
  #
  # :my_component -> :my_bin -> :config_lib -> :my_component_manifest_compile
  #
  # where :my_component_manifest_compile does not depend on :my_component
  component = ":my_component"

  # By default all fields are included, this Inspect won't have `data_path`
  inspect_skip_fields = ["data_path"]
}

rustc_binary("my_bin") {
  sources = [ "src/main.rs" ]
  deps = [ ":my_component_config" ]
}

fuchsia_structured_config_values("my_config_values") {
  component = ":my_component"
  values_source = "config/my_component.json5"
}

fuchsia_package("my_package") {
  deps = [
    ":my_component",
    ":my_config_values",
  ]
}

We will develop comparable build integrations for out-of-tree customers when structured configuration is ready for OOT usage.

C++

#include "src/my_project/my_component/config.h"

int main(int argc, char** argv) {
  auto config = my_component::Config::TakeFromStartupHandle();

  auto context = sys::ComponentContext::CreateAndServeOutgoingDirectory();
  auto inspector_ = std::make_unique<sys::ComponentInspector>(context);
  config.RecordInspect(inspector_.GetRoot().CreateChild("config"));

  if config.test_only() {
    // ...
  }

  std::ifstream file(config.data_path());
  // ...

  while (true) {
    // ...
    std::this_thread::sleep_for(
      std::chrono::nanoseconds(config.check_interval_ns()));
  }
}

For a driver, the start arguments will be required to avoid process-global dependencies:

#include "src/my_project/my_component/config.h"

zx_status<> Init(fdf::wire::DriverStartArgs& start_args, /*...*/) {
  auto config = my_component::Config::TakeFromStartArgs(start_args);
  // ...
}

Build templates in fuchsia.git will place headers in target_gen_dir and add that to the library's include_dirs so that users can include the generated header as an implementation header for their component.

When building accessors support in the SDK we will ensure that users can configure the include directory layout to match any style guidance they have.

Rust

#[fuchsia::component]
async fn main() {
    let inspector = fuchsia_inspect::component::inspector();
    let config_node = inspector.root().create_child("config");

    let config = my_component::Config::take_from_args();
    config.record_inspect(&config_node);

    if config.test_only {
        // ...
    }

    let contents = std::fs::read(&config.data_path).unwrap();
    let mut interval = Interval::new(Duration::from_nanos(config.check_interval_ns));
    let _checker = Task::local(async move {
        while let Some(()) = interval.next().await {
            // ... use `contents` ...
        }
    });

    // for completeness, set up an /out directory for our inspect and serve it
    let mut fs = ServiceFs::new_local();
    inspect_runtime::serve(inspector, &mut fs).unwrap();
    fs.take_and_serve_directory_handle().unwrap();
    while let Some(()) = fs.next().await {}
}

Note that this example uses async and a VFS to demonstrate serving Inspect, but an executor and VFS implementation will not be required solely to access structured configuration values.

Implementation

Versioning

Accessors will receive configuration values from the Component Framework with a checksum of the configuration schema for which they were encoded (see the RFC for config in CML for background, a future RFC will specify the delivery mechanism). The accessor must check that the received checksum exactly matches the checksum with which the accessor was generated, aborting the component if there is a mismatch. This will prevent misinterpreting the payload, acting as a final guard to prevent mispackaging of component binaries and/or manifests that were compiled from different schemas.

An implication of this design is that components will need to be recompiled when their configuration schema changes.

A rejected alternative would be to have the runner cooperate with Component Manager to verify the checksum before starting the component.

Library internals

Per RFC-0127, structured configuration payloads will be encoded as persistent FIDL messages with a struct as the primary object (further details to be recorded in a future RFC).

We will generate FIDL libraries for parsing the encoded messages, and also generate small runtime-specific wrapper libraries which know how to

  1. retrieve the encoded message from the language- or runner-specific runtime
  2. check the encoded message's checksum against that from the component manifest
  3. invoke the FIDL bindings' decode functionality
  4. convert the decoded FIDL domain object into a generated type

We have considered and rejected alternatives to

Naming

The generated wrapper library will contain the intended user-facing APIs for accessing configuration. The wrapper's namespace will by default be the same as the target name for the GN rule which generates it and can be overridden with the library_name argument.

The generated FIDL library will receive its name from build templates, by default the GN target name with underscores removed, appended to the platform name (fuchsia. for in-tree components). For example, in the syntax snippets above, the GN template invocation creates a FIDL library name of fuchsia.mycomponent.

The generated struct will be named Config and we may allow users to override that if it is a commonly requested feature.

The rules for configuration field identifiers mean that all config keys in CML are valid FIDL field identifiers, no mangling is required.

Generated FIDL library

Each structured configuration schema is compiled to a FIDL struct which will be converted to a type defined within the structured configuration wrapper. Users will not be exposed to types generated by the FIDL toolchain.

For the syntax example above, this would be:

library fuchsia.mycomponent;

type Config = struct {
    check_interval_ns int64;
    data_path string:256;
    test_only bool;
};

Generated wrapper code

Each wrapper will contain a type that corresponds to the generated FIDL domain object's type, but with an additional factory function to retrieve the config from the component's runtime and a method to record the config to inspect.

Generated C++ libraries will provide an interface that feels familiar to users of the natural types in the unified C++ FIDL bindings. For example, the above syntax example would generate this in C++:

namespace my_component {
class Config {
public:
  static Config TakeFromStartupHandle() noexcept;
  void RecordInspect(inspect::Node* node);

  const uint64_t& check_interval_ns() const { return check_interval_ns_; }
  uint64_t& check_interval_ns() { return check_interval_ns_; }

  const std::string& data_path() const { return data_path_; }
  std::string& data_path() { return data_path_; }

  const bool& test_only() const { return test_only_; }
  bool& test_only() { return test_only_; }

private:
  // ...
};
};

In Rust, we will generate a type similar to the single flavor of bindings. The syntax example above would generate this for Rust components:

pub struct Config {
    pub check_interval_ns: u64,
    pub data_path: String,
    pub test_only: bool,
}

impl Config {
  pub fn take_from_args() -> Self { ... }
  pub fn record_inspect(&self, node: &fuchsia_inspect::Node) { ... }
}

We will generate one "flavor" of accessor library for each language or runtime environment which uses a different method to deliver encoded configuration (details of runner implementation to be ratified in a future RFC). For example, C++ drivers have a different accessor library than C++ components run directly by the ELF runner, even though they use the same language.

Language support

Initially we will support C++ and Rust. Over time we will cover all target-side supported languages.

Dependencies

Each dependency for generated libraries represents a tax on OOT integrators and should be avoided if possible.

Out-of-tree support

It will eventually be necessary for out-of-tree customers to generate their own configuration accessors, similar to how the FIDL toolchain is integrated with petals' builds.

Performance

Parsing structured configuration payloads may add some additional overhead to a component's start time, with the most significant impact in cases where the component previously compiled configuration values directly into its binary. We do not expect user-visible impact either way because configuration parsing is expected to be a one-time operation when a component first starts.

This potential impact will be partly mitigated by internally reusing parsers generated by the FIDL toolchain, the performance of which is already benchmarked continuously.

We will monitor the TimeToStart performance metric.

Backwards Compatibility

Configuration accessor libraries are generated from a component's compiled manifest and will embed the same configuration schema checksum as the manifest. We will not support any version skew between a component's configuration schema, its value file, and its accessor library. In the future, evolution of configuration values may be supported by the overall Component Framework but from the component's perspective it will always receive a complete and consistent configuration.

Security considerations

Components should trust that the configuration payloads they receive are well-formed according to their declared schema, as the Component Framework must only start the component if it can provide a compatible configuration. The details of this resolution and encoding will be defined in a subsequent RFC.

We hope that providing infallible and non-nullable accessors will discourage the use of default configuration values in code, making it easier to audit which actual values are executed at runtime and reducing the surface area of possible misconfigurations and subsequent attacks.

Privacy considerations

Future extensions to structured configuration may require changes to support working with PII but we do not anticipate any changes of which config accessors would need to be aware.

It is not necessary to redact user data in generated Inspect helpers since redaction is performed by Archivist when reading values. Users of structured configuration will need to add their configuration fields to a product's allowlist of selectors for them to appear in crash reports.

Testing

To support developers writing tests, it should be possible to construct a configuration object in code.

Use of accessor libraries will be covered by multi-language conformance tests for structured configuration which ensure that configuration can be loaded, resolved, encoded, delivered, and parsed by a component which reports the results back to the conformance suite.

Documentation

Accessor libraries will be documented with examples for each supported language in the feature documentation and codelabs for structured configuration.

Drawbacks, alternatives, and unknowns

Alternative: Expose config in Component Manager's Inspect

Instead of generating Inspect code within the accessor libraries, we could add the resolved config values to Component Manager's Inspect output. There is precedent for doing so with CPU stats, but tracking the resource usage of this feature is difficult and we prefer a solution which "bills" the memory usage to the component with the config values.

Alternative: Global variables for access

We could consider accessor APIs that expose access to a single global instance of the parsed configuration. For example, in Rust:

static CONFIG: Lazy<Config> = Lazy::new(|| /* ...retrieve from runner... */);

Using a single global instance of configuration values is attractive from the perspective of resource economy, as it would use language features to ensure only a single copy of a component's configuration was instantiated into memory at a time. It also conceptually mirrors how developers often think of their configuration.

However many languages discourage use of globals/statics except when necessary, and choosing this style of API would force developers into their use even when not strictly necessary. Developers of individual components will still have the option of wrapping the generated accessor result in a global variable.

Further, some environments (e.g. the current driver framework) discourage or disallow the use of globals with implicit initialization. The style of generated configuration access should be as familiar as possible to developers across different environments, and using function calls means those different environments will only differ in requiring additional parameters.

Alternative: runner-based verification

For ELF-based components, it would be possible to include the configuration checksum in a custom header which could be verified by the runner with help from the loader service. This would allow the Component Framework to verify that the compiled accessor matched the encoded values before executing a component's code, which would allow reporting an error directly to the logs without relying on the component to have correctly configured its outputs before attempting to read the configuration. That said, we anticipate a checksum mismatch to be a rare occurrence when components are packaged using tools from the SDK and it would take much more effort to design and implement a faster-failing checksum verification step in the loader and runner.

A design for verifying configuration checksums before creating processes would need to handle cases where a single binary is used to serve multiple components with different configuration interfaces.

We do not anticipate that failures to verify the checksum will be common, and so the value of shifting errors from them earlier will be limited. We also have the additional option of including accessor library metadata in binaries in a way that the product assembly tooling can verify, which would give us an even earlier point in the component development lifecycle to surface these errors. The high complexity to implement pre-process-creation verification, the low expected value of the improved experience, and the opportunity to shift these errors earlier with tools outside the platform suggest we should not pursue this feature until we get new information that changes how we view these tradeoffs.

Alternative: Refine checksum & Inspect support into FIDL features

We could decide to generalize the type hashing and debugging features we need for structured configuration and make them available to all users of FIDL.

This approach is promising in the longer term and the CF and FIDL teams will be taking on an effort to align their technologies more closely as a result of discussions around this RFC and other topics. In particular, we will discuss:

  • designing private or "component-local" namespacing for generated FIDL code, which would allow us to generate code with additional dependencies without impacting the general IPC use cases, also addressing questions around naming and platform versioning
  • options within FIDL for "lockstep versioning" that would provide tools to prevent skew between component manifests and implementing binaries
  • generating additional Inspect debugging code for FIDL types

The work needed on the FIDL side to support these concepts will take some time and we are choosing not to block structured config accessors on it. We anticipate future RFCs to describe these and other component-aware features for FIDL, and to migrate structured config users to new accessor APIs if necessary.

Alternative: Support in FIDL bindings & reusable runtime libraries

We could modify FIDL backends to emit additional code to support structured configuration use cases. This could be done either with a flag passed to the backend (e.g. --enable-config-codegen) or via custom attributes that are passed through to the backend (e.g. @structured_config_checksum() and @structured_config_inspect()).

This would allow us to emit code for checksum verification and Inspect debugging and to then invoke that functionality from reusable libraries that are aware of each runtime's method for passing encoded configuration to a component.

This option has the advantage of reducing the number of "flavors" of codegen we do, because no code generator needs to be aware of each runtime's methods for delivering config -- that knowledge can always live in reusable libraries. It also has the advantage of reducing the effort to integrate structured config with an OOT build system, as existing ones have already integrated the FIDL toolchain.

However, this option has a significant drawback. The FIDL toolchain does not currently have a way to mark features in bindings as unstable or experimental. All code emitted by the binding has the same stability needs which means that adding structured configuration attributes to an SDK-available C++ backend would be immediately available for all SDK customers. This is in conflict with our goal to roll out structured config incrementally, preserving the ability to modify the APIs as much as possible while we continue to learn from our users.

Further, it is unclear whether it makes sense for the FIDL toolchain to generate layers of bindings which take dependencies on higher-level Fuchsia concepts like Inspect when the currently-generated bindings are themselves used to implement those higher-level concepts.

Alternative: fidlgen backends for configuration

We could define separate backends for generating structured config support code. This would allow us to achieve different stability properties than the current fidlgen backends and to include additional dependencies in the generated accessors.

However, to avoid exposing users to multiple generated namespaces we would need to emit both config support code and FIDL domain objects within a single namespace. This generated library could not be linked into the same binary as a basic/non-config FIDL binding for the same library. In practice we would not expect users to try to generate both "config-aware" and "basic" FIDL bindings for a given library, but it would create new constraints on FIDL codegen to either guard against that use case or design around it. We believe that this solution would only be temporary until we reconcile CF and FIDL technologies more broadly. We prefer to take on tech debt that grants greater design and implementation freedom without creating potential maintenance hazards for the FIDL team.

Alternative: Expose FIDL libraries with additional layers

We could generate a "base" FIDL binding that has the same contents as all bindings today, and then generate an additional "layer" of config support code which knows how to retrieve that FIDL type from a component's runtime. This would achieve our implementation goals of clean separations between FIDL IPC and other concerns, but would expose users to two generated namespaces without giving them any additional benefit.

Alternative: Generating libraries with no FIDL dependency

We could consider having no FIDL dependency whatsoever and generating our own parsers. This approach would be more attractive if we found we were not able to insulate users from the generated FIDL dependency of an accessor library.

Prior art and references

The work here is similar in many ways to FIDL, and readers of this RFC would benefit from an understanding of accessing Fuchsia's binary formats.

Fuchsia has a number of examples of hand-written accessors for configuration, for example fshost's configuration class which used to parse a custom line-delimited format before switching to an early prototype of structured configuration.

There are many language-specific libraries which provide "typed parsers" for dynamic "configuration interfaces" like argv or JSON configuration files. In fuchsia.git, argh is a popular Rust library for parsing argv, and serde/serde_json are frequently used in a similar fashion for parsing configuration files out of bootfs and packages.