|RFC-0034: Null terminate strings|
We propose that strings in FIDL will be null terminated (all the while retaining the size of the string as is the case today). Also, change the C bindings to use uint8_t* for string data
|Date submitted (year-month-day)||2019-02-08|
|Date reviewed (year-month-day)||2019-02-21|
Strings in FIDL will be null terminated (all the while retaining the size of the string as is the case today);
Change the C bindings to use
uint8_t*for string data.
The current FIDL string encoding makes it easy to accidentally write insecure code. The C and low-level C++ bindings are used in some of the most privileged code in the Fuchsia system so extra weight should be given to risks exposed at those layers.
We have accidentally come across such bugs, but a systematic audit of code in the Fuchsia tree is difficult and won't prevent them reoccurring in our tree or in third-party code.
This proposes to change the encoding of FIDL strings to include a single terminator byte with value zero. This does not make FIDL strings compatible with C strings but is a FIDL string is used as a C string it will be interpreted as shorter than intended rather than longer than intended, which is safer.
The new definition of strings for wire encoding would be (changes highlighted):
- Variable-length sequence of UTF-8 encoded characters representing text.
- Nullable; null strings and empty strings are distinct.
- Can specify a maximum size, e.g.
string:40for a maximum 40 byte string (not including null-terminator).
- String content has a null-terminator.
- Encoders and decoders MUST validate that there is a null byte after the last byte of the string, as indicated by the length.
- Stored as a 16 byte record consisting of:
size: 64-bit unsigned number of code units (bytes) excluding null-terminator
data: 64-bit presence indication or pointer to out-of-line string data
- When encoded for transfer,
dataindicates presence of content:
0: string is null
UINTPTR_MAX: string is non-null, data is the next out-of-line object
- When decoded for consumption,
datais a pointer to content:
0: string is null
<valid pointer>: string is non-null,
datais at indicated memory address
Strings are denoted as follows:
string: non-nullable string (validation error occurs if null data is encountered)
string?: nullable string
string:N?: string with maximum length of N code units
This will constitute a breaking wire format change, specifically a string whose length is divisible by 8 will be 8 bytes longer (it'll align to 8 bytes, and be null terminated, causing another 7 bytes to be added as padding to align to 8 again).
Encoders will need to be updated to add null termination to encoded strings and to validate that there are no null characters within string content. Decoders will need to be updated to check that there are no null characters within string content but that there is a null terminator where the length indicates.
Currently the C bindings represent strings as a
char* and a
size_t. If that
char* is passed to a function that expects a C string it may be interpreted
incorrectly. The bindings will be changed to use
uint8_t* instead so that
passing a string data pointer to
printf("%s") will fail to
This is a breaking wire-format change. Its deployment will need to be carefully coordinated across all uses of FIDL.
Behind some build-time flag(s) the following code will need to be updated:
//zircon/system/ulib/fidl/walker.h(to validate strings correctly)
- llcpp bindings
- out-of-tree bindings for other languages
These should be tested with the
fidl_compatibility_test, running tests and
confirming expected system stability.
Actually landing the change will need to be coordinated with the release team and external teams like Chromium.
This makes the C and low-level C++ bindings much easier to use correctly and has no impact on other bindings.
Documentation and Examples
The wire format documentation should be updated (as outlined above).
This change is API compatible but ABI incompatible.
This will have no impact on build performance but will have the following minor performance impacts:
- Each string will take on average one extra byte to transmit
This change is specifically intended to fix potential security holes for memory-unsafe languages using FIDL.
This will be tested using the compatibility test suite. That suite should be extended to ensure that edge cases like 7, 8 and 9 byte long strings are handled correctly.
Drawbacks, Alternatives, and Unknowns
This will increase the size of FIDL messages by on average one byte per string. For strings whose length is not divisible by 8 there will be no change. For strings whose length is divisible by 8 the length will be increased by 8 bytes.
We could keep everything as it is and rely on code review and documentation to ensure people don't write wrong code. The issue will also be somewhat mitigated by the adoption of the new low-level-C++ bindings. The potential dangers of the class of bugs caused by this subtle issue too serious to just leave things as they are.
Null terminate but ban embedded null bytes / use modified UTF-8
(This was the original proposal)
'\0' is a valid UTF-8 character and exists in UTF-8 strings that FIDL users
want to exchange. Banning null bytes would make FIDL inappropriate for many uses
and using modified UTF-8 would impose additional overhead to convert normal
UTF-8 with possibly null bytes to modified UTF-8.
uint8_t instead of
If FIDL string data pointers were
uint8_t instead of
char then they couldn't
be passed to standard string functions or
printf without casting. This would
help reveal this class of bug but wouldn't prevent them.
Fill padding with valid ASCII in debug builds
Seven out of eight strings will be followed by zero padding bytes. In debug builds we could change those zeros to valid ASCII characters so that when printed or parsed these bugs show up. It'll be very easy for these to fall through the cracks.
Always move strings out of line
The C/C++ bindings could allocate space for strings and null terminators and copy and terminate all strings that are received. This would impose an unacceptable performance cost on the C and low-level C++ bindings.
Move strings out of line for ASan builds
The address sanitizer will check that code doesn't overrun heap allocations. For ASan builds we could allocate space for strings on the heap, copy the bytes there and return that pointer to the caller. This is non-trivial to integrate into the C bindings, could mean bug-hiding significant differences in behavior between ASan and release builds and wouldn't help developers working outside the Fuchsia build system.
Stop using C and C++
If Google can't write safe C++ in 2019, C++ is not safe for use. Unfortunately C and C++ are the languages of choice for many device driver authors and many vendors may choose to port their existing C/C++ drivers to our platform.
Don't expose raw characters to C
We could expose an opaque structure to C and require any string access in C to copy strings out of the decoded message buffer.