This document serves as an overview of the internal workings of the FIDL compiler fidlc
.
fidlc
is a command line tool that takes in a number of FIDL files, and outputs
FIDL JSON IR.
Overview
The main inputs to the compiler are the arguments to the --files
flags, which describe a list of
files grouped by library. The compiler parses each file individually to obtain a parse tree for
each file:
- Each file is loaded into a
SourceFile
, which owns the buffer backing the file - The compiler initializes a
Lexer
, which takes theSourceFile
as an argument to its constructor. This class exposes aLex()
method, which returns the nextToken
in the file; it can be called repeatedly to get the sequence ofToken
s in the file. - The compiler uses the
Lexer
to initialize aParser
and then calls theParse()
method, which constructs a parse tree. The code refers to parse trees as a raw AST. This function returns araw::File
, which is the class representing the root node of the raw AST. - Once the compiler has parsed each file into a parse tree, it then groups the parse trees into a
single AST (referred to in the code as a flat AST) for each library. The root of this flat AST
is a
flat::Library
.- For each library, the compiler traverses every
raw::File
parse tree corresponding to that library, convertingraw::
nodes to theirflat::
counterparts. For example, araw::StructDeclaration
becomes aflat::Struct
, and araw::Constant
becomes aflat::Constant
. The convertedflat::
AST nodes are then stored under a singleflat::Library
. Initially these flat AST nodes contain the same information as the raw AST nodes, but they also contain fields such as avalue
field for values and atypeshape
field for types that are later set during the compilation step.
- For each library, the compiler traverses every
- Once the AST has been fully initialized, the compiler evaluates constants and determines the memory alignment and size information for the declared types.
Finally, we end up with a flat AST that is processed and ready for backend generation either to C bindings or to a JSON IR.
Lexing
The Lexer
works primarily by keeping track of two char *
pointer into the file data. The
current_
pointer marks the current location that the class is at. Thetoken_start_
pointer marks
the start of the lexeme that is currently being worked on. Each time the Lex()
method is called,
the current_
pointer is advanced until a complete lexeme has been traversed. Then, a Token
is
constructed using the data in between the two pointers.
This example shows of how the pointers change during a call to Lex()
on the short FIDL snippet
const bool flag = true;
:
Initial state after lexing the const
keyword:
const bool flag = true;
^current_
^token_start_
Whitespaces are skipped until the next lexeme:
const bool flag = true;
^current_
^token_start_
The current_
pointer is updated until the end of the lexeme:
const bool flag = true;
^current_
^token_start_
At this point, the next Token
that gets returned is ready to be constructed. The
location_
is set to the data between token_start_
and current_
. The kind is set to
Identifier
. Before returning, the pointers are reset and end up in a state similar to the initial
state. This process can then be repeated for the next token:
const bool flag = true;
^current_
^token_start_
Internally the two pointers are manipulated with these main methods:
Skip()
. Skips over any unnecessary characters (e.g. whitespace) by moving both thecurrent_
and thetoken_start_
pointers forward.Consume()
. Returns the current character and advancescurrent_
.Reset()
. Returns the data betweentoken_start_
andcurrent_
. Then, setstoken_start_
to the value ofcurrent_
.
Parsing
The Parser
's goal is to convert a Token
stream into a parse tree (the raw::
AST) with the
Parse()
method. The Token
stream is generated by calling Lex
repeatedly on a FIDL file. This
parser is implemented with recursive
descent. Each node of the raw AST has a
corresponding ParseFoo()
method that consumes Token
s from the Lexer
and returns a
unique_ptr
to an instance of that node. In case there is a failure, a nullptr
is returned.
The Parser
keeps track of the:
- Current nodes that are being built using a stack of
SourceElements
, which are stored inactive_ast_scopes_
. - Current and previous
Token
s that are being processed.last_token_
andprevious_token_
, respectively. - Its own state machine inside of the
Parser::Lex
method. The current token (last_token_
) is always the next token that is about to be consumed, which effectively gives the parser a one token lookahead (i.e. LL(1)).
The Parser
determines what kind of node the current Token
belongs to based on the Token::Kind
of last_token_
, using the Peek()
method. Then, the Parser
updates its state and constructs the
nodes through the use of the ASTScope
class as well as the ConsumeToken
and MaybeConsumeToken
helper methods.
This example shows a simple non-recursive case line by line. The parser method looks like this:
std::unique_ptr<raw::StringLiteral> Parser::ParseStringLiteral() {
ASTScope scope(this);
ConsumeToken(OfKind(Token::Kind::kStringLiteral));
if (!Ok())
return Fail();
return std::make_unique<raw::StringLiteral>(scope.GetSourceElement());
}
It consumes a single token and returns a leaf node, raw::StringLiteral
:
class StringLiteral : public SourceElement {
public:
explicit StringLiteral(SourceElement const& element) : SourceElement(element) {}
virtual ~StringLiteral() {}
}
The method starts by creating a new ASTScope
, which initializes the SourceElement
that will
later be used to create the returned node by pushing it onto active_ast_scopes_
. The start of
the SourceElement
gets set to the first token that is consumed and the end gets set in the call
to GetSourceElement()
before returning.
The new SourceElement
for the StringLiteral
gets initialized inside the ASTScope
constructor:
parser_->active_ast_scopes_.push_back(raw::SourceElement(Token(), Token()));
Then, ConsumeToken
is called to process the next token. This method takes a predicate constructed
with OfKind()
and calls that predicate on last_token_
's kind or subkind. OfKind()
returns a
function that checks that its input matches the given kind or subkind. If the predicate fails (in
this case, if the current token is not a string literal), the error gets stored in the class and
the method returns. The Ok()
call catches the error and stops the compiler with a parsing error.
If the predicate succeeds, the start token of any SourceElement
s in the stack that are
uninitialized are set to the current token:
for (auto& scope : active_ast_scopes_) {
if (scope.start_.kind() == Token::Kind::kNotAToken) {
scope.start_ = token;
}
}
In this example, the start token is set to the top element of the stack since it was initialized at
the start of the method. The Parser
then advances to the next token by setting previous_token_ =
last_token_
and last_token_ = Lex()
.
Finally, the resulting StringLiteral
node is returned using scope.GetSourceElement()
. This sets
the end token of the SourceElement
at the top of the stack to the previous_token_
, and then
returns the SourceElement
. The final node has the same start and end token because
StringLiteral
s are only ever a single token long, but other methods may consume many tokens
before calling GetSourceElement()
. When the method returns, the destructor of ASTScope
pops the
top element off of active_ast_scopes_
.
Compilation
At this point, the compiler has successfully constructed a raw::File
for each input file and
proceeds to compile these raw ASTs into flat ASTs in three steps:
- Consumption: where the raw ASTs (whose representation tries to match the FIDL grammar) are
grouped by FIDL library and de-sugared into flat ASTs (whose representation tries to match the
FIDL language semantics). This step converts one
raw::File
per file into oneflat::Library
per library. - Topological Sorting: to determine the order in which to resolve the flat AST nodes (see next step).
- Resolution: which performs name resolution, type resolution, and type shape and size
calculations. The resolution process is done node by node, and the resulting information is
stored into the
flat::
node itself.
Consumption
After each file is parsed into a raw::File
and an empty AST (flat::Library
) has been
initialized for each library, the AST needs to be updated with all of data from the raw::File
s
that correspond to it. This is done recursively with the ConsumeFoo()
methods. Each
ConsumeFoo()
method generally takes the corresponding raw AST node as input, updates the
Library
class, and then return a bool
to indicate success or failure. These methods are
responsible for:
- Validating the placement of attributes; for example checking that the
Transport
attribute is only specified for protocols. - Checking for any undefined library dependencies (e.g. using
textures.Foo
will error if thetextures
library was not imported) - Converting the raw AST nodes into their flat AST node equivalents, and storing them in the
Library
'sfoo_declarations_
attribute. Initially the values of the flat AST nodes are unset but they are calculated later during compilation. - Registering each declaration by adding them to the
declarations_
vector. Const declarations and enum/bit fields (which declare a value) are also added to theconstants_
vector, whereas all other declarations (which declare a type) get their corresponding type template added to the library's typespace.
Topological Sorting
Once all of the declarations for a given Library
have been added to the declarations_
vector,
the compiler can proceed to resolve each individual declaration. However, it must do this in the
correct order (so that any dependencies of a declaration are resolved before it); this is
accomplished by first sorting the declarations into a separate declarations_order_
vector, and
then iterating through it to compile each declaration. The sorting is done in the
SortDeclarations()
method, and makes use of DeclDependencies()
to determine the dependencies
for a given declaration.
Resolution
Given the sorted declarations, the compilation happens through the CompileFoo
methods, generally
corresponding to the AST nodes (e.g. CompileStruct
, CompileConst
), with CompileDecl
as the
entrypoint. The main purpose of CompileDecl
is for:
TypeDecl
s to determine theirTypeShape
Const
s to resolve their value, andTypeConstructor
s to set theirType
.
Once this step is complete, the flat::Library
contains all the necessary information for any code
generation. The FIDL compiler can directly generate C bindings, or can generate a JSON IR that can
be consumed by a separate backend
Additional Checks
Before marking compilation as successful, the FIDL compiler also does a few additional checks: for example, it will check that any constraints on attributes are satisfied, and that all declared library dependencies were actually used.
Backend generation
The FIDL compiler emits a JSON intermediate representation (IR). The JSON IR is consumed by a separate program, named the back-end, that generates the language bindings from the JSON IR.
The officially supported FIDL language back-ends are:
- C++:
- High level: fidlgen_hlcpp
- Low level: fidlgen_cpp
- Unified: fidlgen_cpp
- Go: fidlgen_go
- Rust: fidlgen_rust
C Bindings
For historical reasons, C bindings are directly generated from the FIDL compiler - the C bindings
do not support all features and types that are used with the C bindings need to be annotated with
the Layout = "Simple"
attribute.
C Family Runtime
fidlc
is also responsible for generating "coding tables", which are instances of fidl_type_t
used
to represent the FIDL messages at runtime and are used by all of the bindings for the C family of
languages (C, LLCPP, HLCPP). To do this, the flat AST is converted to an intermediate
representation called the "coded" AST with the CodedTypesGenerator
, which iterates through the
flat::Decl
s in a given flat::Library
and converts each raw::Decl
node to the corresponding
coded::Type
node (e.g. flat::Struct
becomes a coded::StructType
).
The coding tables are then generated by the TablesGenerator
, which will generate C code for each
coded::Type
that constructs the equivalent fidl_type_t
type. For example, for a
coded::StructType
called MyStruct
, the TablesGenerator
would write out C code that constructs
an equivalent fidl::FidlCodedStruct
, something like:
const fidl_type_t MyStruct = fidl_type_t(::fidl::FidlCodedStruct(
my_struct_fields, 1u, 32u, "mylibrary/MyStruct"));
Coding tables for FIDL libraries in //sdk/fidl
are generated into the out directory under
fidling/gen/sdk/fidl/LIBRARY/LIBRARY.fidl.tables.c
. For example
out/default/fidling/gen/sdk/fuchsia.sys/fuchsia.sys.fidl.tables.c
. The README
gives additional context. The fidl_type_t
definitions (such as for FidlCodedStruct
) can be found
in internal.h.
Glossary
Decl
The Decl
is the base of all flat AST nodes, just like SourceElement
is the base of all parser
tree nodes, and corresponds to all possible declarations that a user can make in a FIDL file. There
are two types of Decl
s:
Const
s, which declare a value, and have avalue
attribute that gets resolved during compilation, andTypeDecl
s, which declare a message type or interface and have atypeshape
attribute that gets set during compilation.
TypeDecl
s representing an aggregate type (e.g. structs, tables, and unions) also
have a static Shape()
method, which contains the logic for determining the Typeshape
of that given type.
FieldShape
A FieldShape
describes the offset and padding information for members of an aggregate type like a
struct or union. Generally these fields will require both a Typeshape
as well as a FieldShape
Name
A Name
represents a scope variable name, and consists of the library the name belongs to (or
nullptr
for global names), and the variable name itself as a string (for anonymous names) or a
SourceLocation
. Name
s can also optionally include a member_name_
field when referring to the
field of a declared variable (for example MyEnum.MyField
).
SourceElement
A SourceElement
represents a block of code inside a fidl file and is parameterized by a start_
and end_
Token
. All parser tree ("raw" AST) nodes inherit from this class.
SourceFile
Wrapper around a file, which is responsible for owning the data in that file. Also see Virtual SourceFile
SourceLocation
Wrapper around a StringView
and the SourceFile
it comes from. It provides methods to get the
surrounding line of the StringView
as well as its location in the form of a
"[filename]:[line]:[col]"
string
SourceManager
Wrapper around a set of SourceFile
s that all relate to a single Library
.
Token
A token is essentially a lexeme (in the form of a SourceLocation
stored as the
location_
attribute), enhanced with two other pieces information that are useful to the parser
during the later stages of compilation:
- A kind and subkind that, together, classify the lexeme. The possible kinds are:
- The special characters such as
Kind::LeftParen
,Kind::Dot
,Kind::Comma
, etc... - String and numeric constants
- Identifiers. Tokens that are keywords (e.g.
const
,struct
) are considered identifiers, but also have a subkind defined to identify which keyword it is (e.g.Subkind::Const
,Subkind::Struct
). All other tokens have a subkind ofNone
. - Uninitialized tokens have a kind of
kNotAToken
.
- The special characters such as
Type
A struct representing an instance of a type. For example, the vector<int32>:10?
type corresponds
to an instance of the VectorType
TypeDecl
with max_size_ = 10
and maybe_arg_type
set to the
Type
corresponding to int32
. Built-in types all have a static Shape()
method that
returns the Typeshape
given the parameters for an instance of that type. User defined types (e.g.
structs or unions) will all have a type of IdentifierType
- the corresponding
TypeDecl
, like Struct
provides the static Shape()
method instead.
TypeDecl
See Decl
TypeShape
Information about how objects of a type should be laid out in memory, including their size, alignment, depth, etc.
Typespace
The typespace is a map from Type
names to a TypeTemplate
for that Type
. During
compilation, the typespace is initialized to include all of the built-in types (e.g. "vector"
maps to VectorTypeTemplate
), whereas user-defined types get added during the compilation process.
This also ensures that a single type template instance exists per type and avoids name
collisions/shadowing of types (e.g. https://fxbug.dev/42156366).
TypeTemplate
Instances of TypeTemplates provide a Create()
method to create a new instance of a specific
Type
- therefore there is a TypeTemplate subclass for each built-in FIDL type (e.g.
ArrayTypeTemplate
, PrimitiveTypeTemplate
, etc.) as well as a single class for all user defined
types (TypeDeclTypeTemplate
), and one for type aliases (TypeAliasTypeTemplate
). Create()
takes as parameters the possible type parameters: argument type, nullability, and size.
For example, to create an object representing the type vector<int32>:10?
the compiler would call
the Create()
method of the VectorTypeTemplate
with an argument type of int32
, max size of
10
, and a nullability of types::Nullability::kNullable
. This call returns an instance of a
VectorType
with those parameters. Note that not all 3 of these parameters apply to all of the
types (e.g. PrimitiveType
s, like int32
have none of the 3). The Create()
method of the type
template for each type automatically checks that only the relevant parameters are passed.
The concrete type for user defined types is the IdentifierType
, which gets generated by the
TypeDeclTypeTemplate
.
Virtual SourceFile
A subclass of SourceFile
that has a fake "filename" and is initialized with no backing data. It
exposes an AddLine()
method to add data to the file, and is used as a backing SourceFile
for
content that does not appear directly in any of the input SourceFile
s, like for generated
anonymous Name
s.