RFC-0034: Null terminate strings

RFC-0034: Null terminate strings
Status	Rejected
Areas	FIDL
Description	We propose that strings in FIDL will be null terminated (all the while retaining the size of the string as is the case today). Also, change the C bindings to use uint8_t* for string data
Authors	ianloic@google.com
Date submitted (year-month-day)	2019-02-08
Date reviewed (year-month-day)	2019-02-21

Edit this RFC

Edit RFC metadata

Rejection rationale

Summary

We propose:

Strings in FIDL will be null terminated (all the while retaining the size of the string as is the case today);
Change the C bindings to use uint8_t* for string data.

Motivation

The current FIDL string encoding makes it easy to accidentally write insecure code. The C and low-level C++ bindings are used in some of the most privileged code in the Fuchsia system so extra weight should be given to risks exposed at those layers.

We have accidentally come across such bugs, but a systematic audit of code in the Fuchsia tree is difficult and won't prevent them reoccurring in our tree or in third-party code.

Design

This proposes to change the encoding of FIDL strings to include a single terminator byte with value zero. This does not make FIDL strings compatible with C strings but is a FIDL string is used as a C string it will be interpreted as shorter than intended rather than longer than intended, which is safer.

Wire Format

The new definition of strings for wire encoding would be (changes highlighted):

Variable-length sequence of UTF-8 encoded characters representing text.
Nullable; null strings and empty strings are distinct.
Can specify a maximum size, e.g. string:40 for a maximum 40 byte string (not including null-terminator).
String content has a null-terminator.
Encoders and decoders MUST validate that there is a null byte after the last byte of the string, as indicated by the length.
Stored as a 16 byte record consisting of:
- size: 64-bit unsigned number of code units (bytes) excluding null-terminator
- data: 64-bit presence indication or pointer to out-of-line string data
When encoded for transfer, data indicates presence of content:
- 0: string is null
- UINTPTR_MAX: string is non-null, data is the next out-of-line object
When decoded for consumption, data is a pointer to content:
- 0: string is null
- <valid pointer>: string is non-null, data is at indicated memory address

Strings are denoted as follows:

string: non-nullable string (validation error occurs if null data is encountered)
string?: nullable string
string:N, string:N?: string with maximum length of N code units

This will constitute a breaking wire format change, specifically a string whose length is divisible by 8 will be 8 bytes longer (it'll align to 8 bytes, and be null terminated, causing another 7 bytes to be added as padding to align to 8 again).

Encoders will need to be updated to add null termination to encoded strings and to validate that there are no null characters within string content. Decoders will need to be updated to check that there are no null characters within string content but that there is a null terminator where the length indicates.

C Bindings

Currently the C bindings represent strings as a char* and a size_t. If that char* is passed to a function that expects a C string it may be interpreted incorrectly. The bindings will be changed to use uint8_t* instead so that passing a string data pointer to strchr() or printf("%s") will fail to compile.

Implementation Strategy

This is a breaking wire-format change. Its deployment will need to be carefully coordinated across all uses of FIDL.

Behind some build-time flag(s) the following code will need to be updated:

//zircon/system/ulib/fidl/walker.h (to validate strings correctly)
//zircon/system/host/fidl/lib/flat_ast.cpp (update StringType::Shape)
//zircon/system/host/fidl/lib/c_generator.cpp (update EmitLinerarizeMessage, ProduceInterfaceClientImplementation, etc)
//sdk/lib/fidl/hlcpp/string.cc
//garnet/public/lib/fidl/rust/fidl/src/encoding.rs
//third_party/go/src/syscall/zx/fidl/encoding.go (update marshalString, unmarshalString)
//third_party/go/src/syscall/zx/fidl/encoding_new.go (update mString)
//sdk/dart/fidl/lib/src/types.dart (update StringType)
llcpp bindings
out-of-tree bindings for other languages

These should be tested with the fidl_compatibility_test, running tests and confirming expected system stability.

Actually landing the change will need to be coordinated with the release team and external teams like Chromium.

Ergonomics

This makes the C and low-level C++ bindings much easier to use correctly and has no impact on other bindings.

Documentation and Examples

The wire format documentation should be updated (as outlined above).

Backwards Compatibility

This change is API compatible but ABI incompatible.

Performance

This will have no impact on build performance but will have the following minor performance impacts:

Each string will take on average one extra byte to transmit

Security

This change is specifically intended to fix potential security holes for memory-unsafe languages using FIDL.

Testing

This will be tested using the compatibility test suite. That suite should be extended to ensure that edge cases like 7, 8 and 9 byte long strings are handled correctly.

Drawbacks, Alternatives, and Unknowns

This will increase the size of FIDL messages by on average one byte per string. For strings whose length is not divisible by 8 there will be no change. For strings whose length is divisible by 8 the length will be increased by 8 bytes.

Alternatives

Do Nothing

We could keep everything as it is and rely on code review and documentation to ensure people don't write wrong code. The issue will also be somewhat mitigated by the adoption of the new low-level-C++ bindings. The potential dangers of the class of bugs caused by this subtle issue too serious to just leave things as they are.

Null terminate but ban embedded null bytes / use modified UTF-8

(This was the original proposal)

'\0' is a valid UTF-8 character and exists in UTF-8 strings that FIDL users want to exchange. Banning null bytes would make FIDL inappropriate for many uses and using modified UTF-8 would impose additional overhead to convert normal UTF-8 with possibly null bytes to modified UTF-8.

Just use `uint8_t` instead of `char`

If FIDL string data pointers were uint8_t instead of char then they couldn't be passed to standard string functions or printf without casting. This would help reveal this class of bug but wouldn't prevent them.

Fill padding with valid ASCII in debug builds

Seven out of eight strings will be followed by zero padding bytes. In debug builds we could change those zeros to valid ASCII characters so that when printed or parsed these bugs show up. It'll be very easy for these to fall through the cracks.

Always move strings out of line

The C/C++ bindings could allocate space for strings and null terminators and copy and terminate all strings that are received. This would impose an unacceptable performance cost on the C and low-level C++ bindings.

Move strings out of line for ASan builds

The address sanitizer will check that code doesn't overrun heap allocations. For ASan builds we could allocate space for strings on the heap, copy the bytes there and return that pointer to the caller. This is non-trivial to integrate into the C bindings, could mean bug-hiding significant differences in behavior between ASan and release builds and wouldn't help developers working outside the Fuchsia build system.

Stop using C and C++

If Google can't write safe C++ in 2019, C++ is not safe for use. Unfortunately C and C++ are the languages of choice for many device driver authors and many vendors may choose to port their existing C/C++ drivers to our platform.

Don't expose raw characters to C

We could expose an opaque structure to C and require any string access in C to copy strings out of the decoded message buffer.

Prior Art and References

DBus, Capt'n Proto and CORBA¹ send length and null terminate, protobuf does not null terminate.

CORBA Specification 3.3, part 2, section 9.3.2.7. ↩