RFC-0040: Identifier uniqueness | |
---|---|
Status | Accepted |
Areas |
|
Description | The FIDL specification and front-end compiler currently considers two identifiers to be distinct based on simple string comparison. This proposes a new algorithm that takes into account the transformations that bindings generators make. |
Authors | |
Date submitted (year-month-day) | 2019-04-08 |
Date reviewed (year-month-day) | 2019-05-14 |
"SnowFlake vs SNOW_FLAKE"
Summary
The FIDL specification and front-end compiler currently considers two identifiers to be distinct based on simple string comparison. This proposes a new algorithm that takes into account the transformations that bindings generators make.
Motivation
Language binding generators transform identifiers to comply with target language constraints and style that map several FIDL identifiers to a single target language identifier. This could cause unexpected conflicts that aren't visible until particular languages are targeted.
Design
This proposes introducing a constraint on FIDL identifiers that no existing libraries violate. It doesn't change the FIDL language, IR (yet 1), bindings, style guide or rubric.
In practice, identifiers consist of a series of words that are joined together.
The common approaches for joining words are CamelCase
, where a transition
from lower to upper case is a word boundary, and snake_case
, where one or
many underscores (_
) are used to separate words.
Identifiers should be transformed to a canonical form for comparison. This will
be a lower_snake_case
form, preserving the word separation in the original
form. Words are broken (1) where there are underscores, (2) on transitions from
lower-case or digit to upper-case, and (3) before transitions from upper-case to
lower-case.
In FIDL, identifiers must be used in their original form.
So if a type is named FooBar
, attempting to refer to it as foo_bar
is an error.
There is a simple algorithm to carry out this transformation, here in Python2:
def canonical(identifier):
last = '_'
out = ''
for i, c in enumerate(identifier):
is_next_lower = i + 1 < len(identifier) and identifier[i+1].islower()
if c == '_':
if last != '_':
out += '_'
elif (((last.islower() or last.isdigit()) and c.isupper())
or (last != '_' and c.isupper() and is_next_lower)):
out += '_' + c.lower()
else:
out += c.lower()
last = c
return out
Some examples, with their possible translation in various target languages:
FIDL identifier | Canonical form | C++ | Rust | Go | Dart |
---|---|---|---|---|---|
foobar | foobar | foobar | foobar | Foobar | foobar_ |
foo_bar | foo_bar | foo_bar | foo_bar | FooBar | fooBar_ |
foo__bar | foo_bar | foo_bar | foo_bar | FooBar | fooBar_ |
FooBar | foo_bar | foo_bar | foo_bar | FooBar | fooBar_ |
fooBar | foo_bar | foo_bar | foo_bar | FooBar | fooBar_ |
FOOBar | foo_bar | foo_bar | foo_bar | FooBar | fooBar_ |
Implementation strategy
The front-end compiler will be updated to check that each new identifier's canonical form does not conflict with any other identifier's canonical form.
The next version of the FIDL IR should be organized around canonical names rather than original names, but the original name will be available as a field on declarations. If we can eliminate the use of unmodified names in generated bindings then the original names can be dropped from the IR.
Ergonomics
This codifies constraints on the FIDL language that exist in practice.
Documentation and examples
The FIDL language documentation would be updated to describe this constraint. It would be expanded to include much of what's in the Design section above.
Because this proposal simply encodes existing practice, examples and tutorials won't be affected.
Backwards compatibility
Any existing FIDL libraries that would fall afoul of this change violate our style guides and won't work with many language bindings. This does not change the form of identifier that is used to calculate ordinals.
Performance
This imposes a negligible cost to the front-end compiler.
Security
No impact.
Testing
There will be extensive tests for the canonicalization algorithm
implementation in fidlc
.
There will also be fidlc
tests to ensure that errors are caught when
conflicting identifiers are declared and to make sure that the original names
must be used to refer to declarations.
Drawbacks, alternatives, and unknowns
One option is to do nothing.
Generally we catch these issues as build failures in non-C++ generated bindings.
As Rust is used more in fuchsia.git
, the chance of conflicts slipping
through to other petals is lessened.
And these issues are already pretty rare.
The canonicalization algorithm is simple but has one unfortunate failure case
— mixed alphanumeric words in UPPER_SNAKE_CASE identifiers might be
broken.
For example H264_ENCODER
→ h264_encoder
but A2DP_PROFILE
→
a2_dp_profile
.
This is because the algorithm treats digits as lower-case letters.
We have to break on digit-to-letter transitions because H264Encoder
should
canonicalize as h264_encoder
.
Identifiers with no lower-case letters could be special cased — only
breaking on underscores — but that adds complexity to the algorithm and
perhaps to the mental model.
The canonical form could be expressed as a list of words rather than a lower_camel_case string. They're equivalent and in practice it's simpler to manage them as a string.
We could use identifiers' canonical form when generating ordinals. That would make this a breaking change for no obvious benefit. If there is an ordinal-breaking flag day in the future then we could consider that change then.
Initial rejection, and second review
During its first review on 4/18/2019, this FTP was rejected with the following rationale.
- Two opposing views on solving this class of problems.
- Work to model target languages' constraints to maintain as much
flexibility in FIDL as possible, even if that is different than the
recommended style.
That's the approach taken by this FTP.
- Pros: Keeps flexibility for eventual uses of FIDL beyond Fuchsia, more pure from a programming language standpoint.
- Cons: Scoping rules are more complex, style is not enforced, but encouraged (through linting for instance). Could lead to APIs built by partners that do not conform to the Fuchsia style guide we want (since they are not required to run, or adhere to linting).
- Enforce style constraints directly in the language, which eliminates the
class of problem.
- Pros: Style is enforced, developers are told how things ought to be, or it doesn't compile.
- Cons: ingrains stylistic choices in the language definition, higher hill to climb for novice developers using FIDL.
- → We rejected the proposal, and instead prefer an approach that directly enforces style in the language.
- → Next step here is a formal proposal to make this happen, and clarifies
all aspects of this (e.g., should
uint8
beUint8
,vector<T>
beVector<T>
?)
It was decided to overturn this decision based on the following observations:
The kernel API is now described FIDL. This pushed us to re-opened the identifier uniqueness issue last October, and we essentially overturned the decision, allowing both a 'C-style' and 'FIDL-style' to co-exist. This is checked by the fidl linter today.
We have seen other use cases pushing FIDL to generalize the transport, and with it possibly also local style rules to better support these domains.
Identifier clashes continue to be an issue, and modeling target language constraints is a well identified gap in the FIDL toolchain.
Prior art and references
In proto3 similar rules are applied to generate a lowerCamelCase
name
for JSON encoding.