RFC-0137: Discard unknown data in FIDL

RFC-0137: Discard unknown data in FIDL
Status	Accepted
Areas	FIDL
Description	Make FIDL bindings discard unknown data instead of preserving and proxying it.
Issues	85383
Gerrit change	574045
Authors	mkember@google.com
Reviewers	pascallouis@google.com yifeit@google.com
Date submitted (year-month-day)	2021-08-25
Date reviewed (year-month-day)	2021-10-13

Edit this RFC

Edit RFC metadata

Summary

Most FIDL bindings preserve unknown table fields and union variants, allowing user code to inspect and re-encode the raw bytes and handles. This behavior poses security and privacy risks, adds significant complexity to FIDL, makes wire format migrations difficult, and cannot be implemented in all bindings. We propose to have bindings discard unknown data instead, resulting in the behavior shown in Table 1.

Table 1: Changes for flexible types with unknown data

Type	Can access unknowns?	Can re-encode?	Proxies unknowns?
bits	Yes	Yes	Yes
enum	Yes	Yes	Yes
table	Yes → No	Yes	Yes → No
union	Yes → Ordinal only	Yes → No	Yes → No

Background

Flexible types are an important feature in FIDL for writing evolvable APIs. Introduced in RFC-0033: Handling of unknown fields and strictness, they have been available in all bindings since late 2020. With flexible types, decoding succeeds even if there are unknown members. FIDL tables are always flexible, whereas bits, enums, and unions can be marked strict or flexible. With flexible bits and enums, an unknown value is simply an integer. However, for an unknown table field or union variant, the value consists of raw bytes and handles, which we refer to as unknown data.

Most bindings today preserve unknown data in domain objects. The exception is LLCPP, whose design constraints make this difficult to support. For the others, preserving unknown data enables the following behavior. Suppose processes A, B, and C communicate over FIDL. If A and C know about a new table field but B does not, and that field is sent from A to B to C, then C will receive and understand it despite B's ignorance of it. In other words, B proxies the unknown data. An application could also interpret unknown data based on assumptions about the schema, for example that the first four bytes are always an identifier. However, such cases are contrived and better modeled directly with FIDL types. Proxying is the only realistic use case for preserving unknown data.

Motivation

In designing FIDL, we strive to solve real problems using the fewest features necessary. The feature of preserving unknown data has not lived up to these principles. Since its implementation, it has seen little or no use in Fuchsia, and it has repeatedly come up as a complicating factor in other FIDL efforts.

This leads us to question the merits of proxying unknown data. Is it a good idea in the first place? We contend that it is not, at least not as a default behavior. It might be useful as an opt-in feature for FIDL servers that are truly meant to be proxies. However, these cases would be better served by dedicated proxying support in FIDL, which would address all aspects of the problem, not just unknown data.

Even if we suppose that proxying by default is desirable, it only works when directly re-encoding a FIDL domain object. However, it is common (and recommended) to convert these objects to richer, application-specific types before further processing. This practice is counter to how one would approach proxying, where it is desirable to either pass encoded messages unaltered, or directly re-encode decoded messages with minimal processing. For example, the Rust crate fidl_table_validation provides validation while converting a FIDL domain object into an application domain object. Thus a peer sending a table across multiple hops in a complex system cannot rely on all fields reaching the final destination if any participants use this pattern.

Whether or not proxying is desirable, preserving unknown data has several downsides. It makes wire format migrations more difficult. During a migration, there comes a point when all peers can read both the old and new format, meaning it is safe to start writing the new format. Since this change cannot occur everywhere simultaneously, there will inevitably be a period where a peer is receiving both old-format and new-format messages. Suppose it receives one of each, both with unknown table fields, and then tries to encode both tables in a single message. The only way to preserve unknown data in this case is to include wire format metadata in every envelope, but this would add unacceptable complexity and overhead.

Another downside concerns feature parity among FIDL bindings. In bindings that support in-place decoding (e.g. LLCPP), it is difficult to choose a domain object representation that can both own handles and represent unknown handles. For known data, the decoder inserts handles in the domain object by overwriting their presence indicators. For unknown data, the decoder only knows the number of handles to skip in the handle table, not the locations of their presence indicators. It is thus not possible to return an owning domain object short of returning both the domain object and a re-packed handle table. Instead, these bindings simply do not support preserving unknown data. This is likely to surprise users who rely on it in other bindings, and it increases our testing burden, requiring two GIDL tests for all cases involving unknown data.

In general, the need to preserve unknown data adds significant complexity to FIDL. This complexity is not limited to the implementation, but affects users as well, due to interactions with other features. For example, the distinction between value and resource types was designed to only affect API compatibility, not ABI. However, it was later discovered to have unavoidable ABI impact in the case when unknown handles are received for a flexible value type. This corner case only exists because of the requirement to preserve unknown data in domain objects.

Stakeholders

Who has a stake in whether this RFC is accepted? (This section is optional but encouraged.)

Facilitator: pascallouis@google.com

Reviewers: abarth@google.com, yifeit@google.com, ianloic@google.com

Consulted: bryanhenry@google.com

Socialization: A draft of this RFC was sent to the FIDL team for comments.

Design

The handling of unknown values for flexible bits and enums remains unchanged.

When decoding tables and flexible unions:

Bindings MUST NOT store unknown bytes and handles in domain objects, unless the bindings are specifically designed for proxying.
Bindings MUST close all unknown handles.

When re-encoding tables and flexible unions that were previously decoded:

Bindings MUST successfully re-encode the known fields of a table, and MUST NOT include unknown fields (which would imply storing them).
Bindings MUST fail and return an error when encoding a flexible union with an unknown variant.

Concerning domain objects for tables and flexible unions:

Bindings SHOULD NOT provide any mechanism to distinguish a table that had no unknown fields from a table that had unknown fields discarded. They should be considered equal if bindings provide deep equality functions.
Bindings MUST provide a mechanism to determine if a flexible union had an unknown variant, and SHOULD provide access to the unknown ordinal (i.e. the domain object's unknown variant should only store the ordinal). Unknown variants should behave like NaN if bindings provide deep equality functions, comparing unequal even if ordinals are the same.

In Rust, the latter point implies removing the Eq trait from flexible unions and types that transitively contain one, as is already done for floats.

Implementation

The implementation is mostly a matter of deleting the code responsible for preserving unknowns in all the bindings. We do not believe there are any production uses of the unknown data accessors. If there are, we will have to understand the use case and try to find a way forward.

Currently, LLCPP fails to re-encode a table that had unknown fields. This will need to change per the design, to successfully encode just the known fields.

Security considerations

This proposal improves security, as it results in less information and capabilities being passed around implicitly. When unknown data is preserved, it is easy to pass arbitrary bytes and handles through unsuspecting components. When it is discarded, the data boundary becomes accurately encoded by the FIDL schema, making the system easier to audit.

Privacy considerations

This proposal improves privacy because it restricts the transmission of unknown data, which could include sensitive information.

Testing

Testing mostly occurs in GIDL. The success tests involving unknown data will be split into two parts: decode_success and either encode_success (encodes only the known table fields) or encode_failure (unions fail to encode). The representation of values with unknown data will also change. GIDL should no longer parse unknown bytes and handles, and instead use the syntax 123: unknown to indicate an unknown envelope at ordinal 123.

The allowlists and denylists that split LLCPP and non-LLCPP can be removed. All bindings will have the same encoding/decoding behavior with respect to unknown data. In addition, the LLCPP-specific unit tests added in fxrev.dev/428410 can be removed in favor of GIDL tests.

Tests exercising all combinations of strict/flexible and value/resource should remain, although decoding unknown data with handles for a flexible value type will no longer fail.

Documentation

The following documentation needs to be updated:

Drawbacks, alternatives, and unknowns

Alternative: Optionally preserve unknowns

Rather than completely remove support for preserving unknown data, we could continue to support it, just not by default. For example, it could be opt-in with an attribute on the type, perhaps restricted to value types to alleviate concerns about proxying unknown handles. However, this approach adds even more complexity to a little-used feature, and it does not solve the wire format migration problem.

Drawback: Flexible types are inconsistent

A drawback of this proposal is that it makes the behavior of flexible types less consistent, and perhaps less intuitive. To explain this, it helps to classify bits, enums, tables, and unions along two axes as shown in Table 2: algebraic type (product or sum) and payload (with or without a payload).

Table 2: Classification of flexible types

	Product type	Sum type
Without payload	bits	enum
With Payload	table	union

Currently, all flexible types proxy unknown information. This proposal breaks that symmetry along both axes. For example, consider the following FIDL types:

// Product types (multiple fields set)
type Bits  = bits  {    A = 1;         B = 2; };
type Table = table { 1: a struct{}; 2: b struct{}; };

// Sum types (one variant selected)
type Enum  = enum  {    A = 1;         B = 2; };
type Union = union { 1: a struct{}; 2: b struct{}; };

First, we lose consistency across the payload axis. Currently, going from Bits to Table or from Enum to Union increases functionality, permitting each member to carry a payload. With this proposal, that functionality comes with the cost of no longer preserving unknowns.

Second, we lose consistency across the algebraic type axis. Currently, both Table and Union allow re-encoding after decoding objects with unknown data. With this proposal, Table can re-encode but Union cannot.

We believe this trade-off of pragamatism over consistency is worthwhile to avoid the complexities described in Motivation. However, there are alternatives designs described below that retain more consistency.

Alternative: Discard all unknown information

To improve consistency, we could discard all unknown information, even unknown integers that are easily stored. This means having a single unknown state for bits and enums, and discarding the unknown ordinal in addition to the payload for unions. Table 3 shows the resulting behavior.

Table 3: Adjustment of Table 1: Discard all unknown information

Type	Can access unknowns?	Can re-encode?	Proxies unknowns?
bits	Yes → No	Yes	Yes → No
enum	Yes → No	Yes → No	Yes → No
table	Yes → No	Yes	Yes → No
union	Yes → No	Yes → No	Yes → No

Alternative: Optional flexible unions

To improve consistency, we could require that flexible unions are always optional, and then decode unknown variants as absent unions. This would make it possible to re-encode unions, making them consistent with tables. Table 4 shows the resulting behavior.

Table 4: Adjustment of Table 1: Optional flexible unions

Type	Can access unknowns?	Can re-encode?	Proxies unknowns?
bits	Yes	Yes	Yes
enum	Yes	Yes	Yes
table	Yes → No	Yes	Yes → No
union	Yes → No	Yes	Yes → No

Alternative: Remember if unknown fields were discarded

In the proposed design, it is impossible for bindings users to tell if unknown fields were discarded while decoding a table. An alternative would be to store a boolean, or a set of unknown ordinals, in the table domain object. Users could then query this via a function such as has_unknown_fields(). For example, a storage service might want to fail in this case to avoid data loss.

A downside of this alternative is that it adds extra hidden state to table domain objects. They are no longer simple value types, the sum of their fields. For example, it raises the question of whether the == operator should take such a boolean flag into account.

Alternative: Important fields

The only realistic use case for checking had_unknown_fields(), as described earlier, is to fail if it returns true. Rather than providing that accessor in bindings, we could accept an attribute on table and flexible union members to opt into that behavior:

type Data = table {
  1: foo string;
  @important
  2: bar string;
};

The effect of this attribute would be to set a newly reserved bit in the envelope header for that field. When decoders encounter an unknown field with the important bit set, they must fail. In other words, the @important attribute opts out of forward compatibility, functioning like a dynamic version of the static strict modifier we allow on bits, enums, and unions.

This alternative would likely require its own RFC on top of this one.

Acknowledgement: This idea originates with yifeit@google.com.

Alternative: Keep ABI impact of value/resource

This proposal eliminates the ABI impact of RFC-0057, and considers this to be an improvement made possible by discarding unknown data. However, it can be argued that the ABI impact is desirable and should be kept.

Advantages of dropping ABI impact (this proposal):

It makes strict/flexible and value/resource more independent features. There is no longer a special case in their intersection. Given that we did not notice this case until long after writing their respective RFCs, it is likely to be surprising to users as well.
It makes it easier to transition a type from value to resource, since it only breaks API, not ABI. In cases where no code breaks (plausible for request and response types, which cannot be referenced directly in some bindings), this transition no longer silently changes behavior.

Advantages of keeping ABI impact (this alternative):

It more accurately models the intent of the interface. If you indicate that you do not expect handles (by using a value type), and you receive handles at runtime, this points to a gap and failing is appropriate.
If we change our minds, we can drop ABI impact later. Switching in the other direction is more likely to cause disruption.

Prior art and references

Protobuf

The design of protocol buffers has gone back and forth on this point. In proto2, unknown fields are preserved and proxied (like FIDL today). In proto3 the behavior was changed to discard unknown fields during decoding (like this proposal). However, the decision was later reverted, so in versions 3.5 and later proto3 once again preserves unknown fields.

This raises the question: will FIDL follow the same path if we accept this proposal? We believe the answer is no, because FIDL and Protobuf occupy different design spaces. Protobuf had to revert to the old preservation behavior because of two use cases: intermediary servers and read-modify-write patterns. Neither of these is prevalent in Fuchsia. Instead of having intermediary proxy servers, Fuchsia's security and privacy principles encourage direct communication. Instead of the read-modify-write pattern, the FIDL API rubric recommends the partial update pattern.

Thrift

Apache Thrift discards unknown fields.