RFC-0061: Extensible unions | |
---|---|
Status | Accepted |
Areas |
|
Description | To provide more ways to express payloads whose shape may need to evolve over time, we propose to replace unions as they exist today with extensible unions. |
Authors | |
Date submitted (year-month-day) | 2018-09-26 |
Date reviewed (year-month-day) | 2018-10-11 |
"Catering to Hawaii and Alaska"
Summary
To provide more ways to express payloads whose shape may need to evolve over time, we propose to replace unions as they exist today with extensible unions.
Motivation
Today, unions provide no way to evolve over time, and we even warn that "in general, changing the definition of a union will break binary compatibility."
There are a number of unions defined today where extensibility is necessary, e.g., fuchsia.modular/TriggerCondition, where fields are deprecated without being removed, or fuchsia.modular/Interaction.
As described later,
there also many unions whose current representation is
appropriate as they are unlikely to evolve in the near-future. However,
keeping both static unions
and extensible unions
introduces
unneeded complexity, see the
pros and cons.
Design
To introduce extensible unions, we need to modify multiple parts of FIDL:
the language and fidlc
, the JSON IR, the wire format and all language
bindings.
We'll also need to document this new feature in various places.
We discuss each change one by one.
Language
Syntactically, extensible unions look exactly the same as static unions:
union MyExtensibleUnion {
Type1 field1;
Type2 field2;
...
TypeN fieldN;
}
Behind the scenes, each field is assigned an ordinal: this is comparable to how tables have ordinals for each field, and how methods' ordinals get automatically assigned.
Specifically:
- Ordinals are calculated using the same algorithm as method ordinals
(details),
we concatenate the library name,
"
.
", the extensible union name, "/
", and finally the member name, then take the SHA256, and mask with0x7fffffff
. - Ordinals are
uint32
, no two fields can claim the same ordinal, and we disallow0
. In the case of ordinal conflict, the[Selector]
attribute should be used to provide an alternate name (or the member renamed). - Ordinals can be sparse, i.e., unlike how tables work, which require dense ordinals.
- Nullable fields are not allowed on extensible unions.
- Extensible unions MUST have at least one member.
An extensible union can be used anywhere a union can currently be used in the language. Particularly:
- Structs, tables and extensible unions can contain extensible unions;
- Extensible unions can contain structs, tables and extensible unions;
- Interface arguments or returns can be extensible unions;
- Extensible unions can be nullable.
JSON IR
Following tables, we will add one key in each union field declaration "ordinal."
Wire format
On the wire, an extensible union is represented by the ordinal to discriminate amongst the choices (padded to 8 bytes), followed by an envelope of the various members known to the producer. Specifically, that is:
- A
uint32
tag which contains the ordinal of the member being encoded; - A
uint32
padding to align to 8 bytes; - A
uint32
num_bytes storing the number of bytes in the envelope, always a multiple of 8, and must be 0 if the envelope is null; - A
uint32
num_handles storing the number of handles in the envelope, and must be 0 if the envelope is null; - A
uint64
data pointer to indicate presence (or absence) of out-of-line data:0
when envelope is null;- FIDL_ALLOC_PRESENT (or UINTPTR_MAX) when envelope is present, and next out-of-line object;
- When decoded for consumption, this data pointer is either nullptr if envelope is null, or a valid pointer to the envelope otherwise.
- The envelope reserves storage for the handles immediately following the content.
A nullable extensible union has a tag of 0, num_bytes is set to 0, num_handles is set to 0, and the data pointer is FIDL_ALLOC_ABSENT, i.e., 0. Essentially, a null extensible union is 24 bytes of 0s.
Language Bindings
Extensible unions are similar to unions, except that one needs to also handle an "unknown" case when union is read. Ideally, most language bindings would treat
union Name { Type1 field1; ...; TypeN fieldN; };
as they would an extensible union, such that code can easily be switched from one to the other, modulo support of the unknown case, which is meaningful only in the extensible union case.
To start, we suggest no language bindings expose reserved members: while these are present in the JSON IR for completeness, we do not expect that exposing them in language bindings be useful.
Implementation strategy
Implementation will be done in two steps.
First, we will build support for extensible unions:
- Introduce the feature in the language (
fidlc
), by using a different keyword (xunion
) to distinguish between static unions and extensible unions. - Implement the various core language bindings (C, C++, Rust, Go, Dart). Extend the compatibility test, and other tests accordingly.
Second, we will migrate all static unions to extensible unions:
Generate ordinals for static unions, and place them in the JSON IR. Backends should initially ignore those.
On read paths, have both modes of reading unions, as if they were static unions, and as if they were extensible unions (ordinals are needed for that to be possible). Choose between one and the other based on a flag in the transaction message header.
Update write paths to encode unions as extensible unions, and indicate as much by setting the flag in the transaction message header.
When all writers have been updated, deployed, and propagated, remove static union handling, and scaffolding code for the soft transition.
Documentation and examples
This would require documentation in at least these places:
- Language Specification: add a section under "Types and Type Declarations;"
- Wire Format Specification: add a section under "Data Types;"
Backwards compatibility
An extensible union is explicitly not backwards compatible with a "static" union.
Performance
No impact on performance when not used. Negligible performance impact during build time.
Security
No impact on security.
Testing
Unit tests in the compiler, unit tests for encoding/decoding in various language bindings, and compatibility test to check various language bindings together.
Drawbacks, alternatives, and unknowns
Extensible unions are less efficient than non-extensible unions. Furthermore, non-extensible unions are not expressible through other means in the language. As such, we propose both features living side by side.
However, we could decide that only extensible unions should exist, and do
away with unions as currently defined.
This would go against various places in Fuchsia where unions represent
performance critical messages, and where there is little extension
expectation, e.g. fuchsia.io/NodeInfo
, fuchsia.net/IpAddress
.
Pros and Cons of Keeping Static Unions
Pros
- Compared to a union, an extensible union incurs an 8 byte cost (for the size of the envelope, and number of handles). Additionally, extensible unions' data is always stored out-of-line (i.e., an additional 8 bytes for the data pointer), whereas only nullable unions' data are stored out-of-line.
- Because of the encoding of unions, it is not possible to express them with other primitives in FIDL. As such, should they be removed from the language, some classes of messages could not be expressed anymore as compactly and efficiently.
- In some cases, and depending on their use, unions can be represented as
efficiently but differently; however, that is the exception not the norm.
One example that could be rewritten without using union is the
fuchsia.net.stack/InterfaceAddressChangeEvent
used only in the
fuchsia.net.stack/InterfaceAddressChange
where the InterfaceAddress could directly be written, with an
enum
to indicate whether it is added or removed.
Cons
- Keeping both static unions and extensible unions forces complexity in the compiler, the JSON IR, all backends, as well as encoding/decoding. The gains are minimal: the size difference is marginal, in a world where FIDL encoding is not particularly size efficient in the first place. Furthermore, decoding of extensible unions can be done in place if needed.
- As an example of how minimal the gains are, here is the analysis for
fuchsia.io/NodeInfo:
- Today NodeInfo has 6 options: service (size 1), file (size 4), directory (size 1), pipe (size 4), vmofile (size 24), device (size 4).
- As such, the total size of a NodeInfo is always 32 bytes, i.e., tag + max(size of options) = 8 + 24 = 32.
- With extensible unions, NodeInfo size would depend on the option being encoded. There is always a 16 byte 'tax' (vs. 8), so the respective sizes would be: service = 24, file = 24, directory = 24, pipe = 24, vmofile = 40, device = 24.
- So, in all cases, we're shaving off 8 bytes, except in the case of a vmofile where we are adding an additional 8 bytes.
- The complexity in the language of having both static unions and extensible unions is also a worry. We expect library authors to waver between using one vs the other, when choosing extensible unions is a safer long term choice, for very little cost.
All in all, we decided to replace static unions with extensible unions.
Tag vs Ordinal
We use ordinal to denote the internal numeric value assigned to fields,
i.e., the value calculated through hashing.
We use tag to denote the representation of the variants in bindings:
in Go this may be constants of a type alias
, in Dart this may be an enum
.
The fidlc
compiler deals with ordinals only.
Developers would most likely deal with tags only.
And bindings provide translation from the high-level tag, to the low-level
internal ordinal.
No Empty Extensible Unions
During the design phase, we considered having extensible unions be empty. However, we chose to disallow that in the end: choosing a nullable extensible union with a single variant (e.g., an empty struct) clearly models the intent. This also avoids having two "unit" values for extensible unions i.e., a null value and an empty value.
Prior art and references
- Protocol buffers has oneof.
- FlatBuffers's unions aren't extensible except under special circumstances.