This document serves as an overview of the internal workings of the FIDL compiler fidlc.
fidlc is a command line tool that takes in a number of FIDL files, and outputs
FIDL JSON IR.
Overview
The main inputs to the compiler are the arguments to the --files flags, which describe a list of
files grouped by library. The compiler parses each file individually to obtain a parse tree for
each file:
- Each file is loaded into a
SourceFile, which owns the buffer backing the file - The compiler initializes a
Lexer, which takes theSourceFileas an argument to its constructor. This class exposes aLex()method, which returns the nextTokenin the file; it can be called repeatedly to get the sequence ofTokens in the file. - The compiler uses the
Lexerto initialize aParserand then calls theParse()method, which constructs a parse tree. The code refers to parse trees as a raw AST. This function returns araw::File, which is the class representing the root node of the raw AST. - Once the compiler has parsed each file into a parse tree, it then groups the parse trees into a
single AST (referred to in the code as a flat AST) for each library. The root of this flat AST
is a
flat::Library.- For each library, the compiler traverses every
raw::Fileparse tree corresponding to that library, convertingraw::nodes to theirflat::counterparts. For example, araw::StructDeclarationbecomes aflat::Struct, and araw::Constantbecomes aflat::Constant. The convertedflat::AST nodes are then stored under a singleflat::Library. Initially these flat AST nodes contain the same information as the raw AST nodes, but they also contain fields such as avaluefield for values and atypeshapefield for types that are later set during the compilation step.
- For each library, the compiler traverses every
- Once the AST has been fully initialized, the compiler evaluates constants and determines the memory alignment and size information for the declared types.
Finally, we end up with a flat AST that is processed and ready for backend generation either to C bindings or to a JSON IR.
Lexing
The Lexer works primarily by keeping track of two char * pointer into the file data. The
current_ pointer marks the current location that the class is at. Thetoken_start_ pointer marks
the start of the lexeme that is currently being worked on. Each time the Lex() method is called,
the current_ pointer is advanced until a complete lexeme has been traversed. Then, a Token is
constructed using the data in between the two pointers.
This example shows of how the pointers change during a call to Lex() on the short FIDL snippet
const bool flag = true;:
Initial state after lexing the const keyword:
const bool flag = true;
^current_
^token_start_
Whitespaces are skipped until the next lexeme:
const bool flag = true;
^current_
^token_start_
The current_ pointer is updated until the end of the lexeme:
const bool flag = true;
^current_
^token_start_
At this point, the next Token that gets returned is ready to be constructed. The
location_ is set to the data between token_start_ and current_. The kind is set to
Identifier. Before returning, the pointers are reset and end up in a state similar to the initial
state. This process can then be repeated for the next token:
const bool flag = true;
^current_
^token_start_
Internally the two pointers are manipulated with these main methods:
Skip(). Skips over any unnecessary characters (e.g. whitespace) by moving both thecurrent_and thetoken_start_pointers forward.Consume(). Returns the current character and advancescurrent_.Reset(). Returns the data betweentoken_start_andcurrent_. Then, setstoken_start_to the value ofcurrent_.
Parsing
The Parser's goal is to convert a Token stream into a parse tree (the raw:: AST) with the
Parse() method. The Token stream is generated by calling Lex repeatedly on a FIDL file. This
parser is implemented with recursive
descent. Each node of the raw AST has a
corresponding ParseFoo() method that consumes Tokens from the Lexer and returns a
unique_ptr to an instance of that node. In case there is a failure, a nullptr is returned.
The Parser keeps track of the:
- Current nodes that are being built using a stack of
SourceElements, which are stored inactive_ast_scopes_. - Current and previous
Tokens that are being processed.last_token_andprevious_token_, respectively. - Its own state machine inside of the
Parser::Lexmethod. The current token (last_token_) is always the next token that is about to be consumed, which effectively gives the parser a one token lookahead (i.e. LL(1)).
The Parser determines what kind of node the current Token belongs to based on the Token::Kind
of last_token_, using the Peek() method. Then, the Parser updates its state and constructs the
nodes through the use of the ASTScope class as well as the ConsumeToken and MaybeConsumeToken
helper methods.
This example shows a simple non-recursive case line by line. The parser method looks like this:
std::unique_ptr<raw::StringLiteral> Parser::ParseStringLiteral() {
ASTScope scope(this);
ConsumeToken(OfKind(Token::Kind::kStringLiteral));
if (!Ok())
return Fail();
return std::make_unique<raw::StringLiteral>(scope.GetSourceElement());
}
It consumes a single token and returns a leaf node, raw::StringLiteral:
class StringLiteral : public SourceElement {
public:
explicit StringLiteral(SourceElement const& element) : SourceElement(element) {}
virtual ~StringLiteral() {}
}
The method starts by creating a new ASTScope, which initializes the SourceElement that will
later be used to create the returned node by pushing it onto active_ast_scopes_. The start of
the SourceElement gets set to the first token that is consumed and the end gets set in the call
to GetSourceElement() before returning.
The new SourceElement for the StringLiteral gets initialized inside the ASTScope constructor:
parser_->active_ast_scopes_.push_back(raw::SourceElement(Token(), Token()));
Then, ConsumeToken is called to process the next token. This method takes a predicate constructed
with OfKind() and calls that predicate on last_token_'s kind or subkind. OfKind() returns a
function that checks that its input matches the given kind or subkind. If the predicate fails (in
this case, if the current token is not a string literal), the error gets stored in the class and
the method returns. The Ok() call catches the error and stops the compiler with a parsing error.
If the predicate succeeds, the start token of any SourceElements in the stack that are
uninitialized are set to the current token:
for (auto& scope : active_ast_scopes_) {
if (scope.start_.kind() == Token::Kind::kNotAToken) {
scope.start_ = token;
}
}
In this example, the start token is set to the top element of the stack since it was initialized at
the start of the method. The Parser then advances to the next token by setting previous_token_ =
last_token_ and last_token_ = Lex().
Finally, the resulting StringLiteral node is returned using scope.GetSourceElement(). This sets
the end token of the SourceElement at the top of the stack to the previous_token_, and then
returns the SourceElement. The final node has the same start and end token because
StringLiterals are only ever a single token long, but other methods may consume many tokens
before calling GetSourceElement(). When the method returns, the destructor of ASTScope pops the
top element off of active_ast_scopes_.
Compilation
At this point, the compiler has successfully constructed a raw::File for each input file and
proceeds to compile these raw ASTs into flat ASTs in three steps:
- Consumption: where the raw ASTs (whose representation tries to match the FIDL grammar) are
grouped by FIDL library and de-sugared into flat ASTs (whose representation tries to match the
FIDL language semantics). This step converts one
raw::Fileper file into oneflat::Libraryper library. - Topological Sorting: to determine the order in which to resolve the flat AST nodes (see next step).
- Resolution: which performs name resolution, type resolution, and type shape and size
calculations. The resolution process is done node by node, and the resulting information is
stored into the
flat::node itself.
Consumption
After each file is parsed into a raw::File and an empty AST (flat::Library) has been
initialized for each library, the AST needs to be updated with all of data from the raw::Files
that correspond to it. This is done recursively with the ConsumeFoo() methods. Each
ConsumeFoo() method generally takes the corresponding raw AST node as input, updates the
Library class, and then return a bool to indicate success or failure. These methods are
responsible for:
- Validating the placement of attributes; for example checking that the
Transportattribute is only specified for protocols. - Checking for any undefined library dependencies (e.g. using
textures.Foowill error if thetextureslibrary was not imported) - Converting the raw AST nodes into their flat AST node equivalents, and storing them in the
Library'sfoo_declarations_attribute. Initially the values of the flat AST nodes are unset but they are calculated later during compilation. - Registering each declaration by adding them to the
declarations_vector. Const declarations and enum/bit fields (which declare a value) are also added to theconstants_vector, whereas all other declarations (which declare a type) get their corresponding type template added to the library's typespace.
Topological Sorting
Once all of the declarations for a given Library have been added to the declarations_ vector,
the compiler can proceed to resolve each individual declaration. However, it must do this in the
correct order (so that any dependencies of a declaration are resolved before it); this is
accomplished by first sorting the declarations into a separate declarations_order_ vector, and
then iterating through it to compile each declaration. The sorting is done in the
SortDeclarations() method, and makes use of DeclDependencies() to determine the dependencies
for a given declaration.
Resolution
Given the sorted declarations, the compilation happens through the CompileFoo methods, generally
corresponding to the AST nodes (e.g. CompileStruct, CompileConst), with CompileDecl as the
entrypoint. The main purpose of CompileDecl is for:
TypeDecls to determine theirTypeShapeConsts to resolve their value, andTypeConstructors to set theirType.
Once this step is complete, the flat::Library contains all the necessary information for any code
generation. The FIDL compiler can directly generate C bindings, or can generate a JSON IR that can
be consumed by a separate backend
Additional Checks
Before marking compilation as successful, the FIDL compiler also does a few additional checks: for example, it will check that any constraints on attributes are satisfied, and that all declared library dependencies were actually used.
Backend generation
The FIDL compiler emits a JSON intermediate representation (IR). The JSON IR is consumed by a separate program, named the back-end, that generates the language bindings from the JSON IR.
The officially supported FIDL language back-ends are:
- C++:
- High level: fidlgen_hlcpp
- Low level: fidlgen_cpp
- Unified: fidlgen_cpp
- Go: fidlgen_go
- Rust: fidlgen_rust
C Bindings
For historical reasons, C bindings are directly generated from the FIDL compiler - the C bindings
do not support all features and types that are used with the C bindings need to be annotated with
the Layout = "Simple" attribute.
C Family Runtime
fidlc is also responsible for generating "coding tables", which are instances of fidl_type_t used
to represent the FIDL messages at runtime and are used by all of the bindings for the C family of
languages (C, LLCPP, HLCPP). To do this, the flat AST is converted to an intermediate
representation called the "coded" AST with the CodedTypesGenerator, which iterates through the
flat::Decls in a given flat::Library and converts each raw::Decl node to the corresponding
coded::Type node (e.g. flat::Struct becomes a coded::StructType).
The coding tables are then generated by the TablesGenerator, which will generate C code for each
coded::Type that constructs the equivalent fidl_type_t type. For example, for a
coded::StructType called MyStruct, the TablesGenerator would write out C code that constructs
an equivalent fidl::FidlCodedStruct, something like:
const fidl_type_t MyStruct = fidl_type_t(::fidl::FidlCodedStruct(
my_struct_fields, 1u, 32u, "mylibrary/MyStruct"));
Coding tables for FIDL libraries in //sdk/fidl are generated into the out directory under
fidling/gen/sdk/fidl/LIBRARY/LIBRARY.fidl.tables.c. For example
out/default/fidling/gen/sdk/fuchsia.sys/fuchsia.sys.fidl.tables.c. The README
gives additional context. The fidl_type_t definitions (such as for FidlCodedStruct) can be found
in internal.h.
Glossary
Decl
The Decl is the base of all flat AST nodes, just like SourceElement is the base of all parser
tree nodes, and corresponds to all possible declarations that a user can make in a FIDL file. There
are two types of Decls:
Consts, which declare a value, and have avalueattribute that gets resolved during compilation, andTypeDecls, which declare a message type or interface and have atypeshapeattribute that gets set during compilation.
TypeDecls representing an aggregate type (e.g. structs, tables, and unions) also
have a static Shape() method, which contains the logic for determining the Typeshape
of that given type.
FieldShape
A FieldShape describes the offset and padding information for members of an aggregate type like a
struct or union. Generally these fields will require both a Typeshape as well as a FieldShape
Name
A Name represents a scope variable name, and consists of the library the name belongs to (or
nullptr for global names), and the variable name itself as a string (for anonymous names) or a
SourceLocation. Names can also optionally include a member_name_ field when referring to the
field of a declared variable (for example MyEnum.MyField).
SourceElement
A SourceElement represents a block of code inside a fidl file and is parameterized by a start_
and end_ Token. All parser tree ("raw" AST) nodes inherit from this class.
SourceFile
Wrapper around a file, which is responsible for owning the data in that file. Also see Virtual SourceFile
SourceLocation
Wrapper around a StringView and the SourceFile it comes from. It provides methods to get the
surrounding line of the StringView as well as its location in the form of a
"[filename]:[line]:[col]" string
SourceManager
Wrapper around a set of SourceFiles that all relate to a single Library.
Token
A token is essentially a lexeme (in the form of a SourceLocation stored as the
location_ attribute), enhanced with two other pieces information that are useful to the parser
during the later stages of compilation:
- A kind and subkind that, together, classify the lexeme. The possible kinds are:
- The special characters such as
Kind::LeftParen,Kind::Dot,Kind::Comma, etc... - String and numeric constants
- Identifiers. Tokens that are keywords (e.g.
const,struct) are considered identifiers, but also have a subkind defined to identify which keyword it is (e.g.Subkind::Const,Subkind::Struct). All other tokens have a subkind ofNone. - Uninitialized tokens have a kind of
kNotAToken.
- The special characters such as
Type
A struct representing an instance of a type. For example, the vector<int32>:10? type corresponds
to an instance of the VectorType TypeDecl with max_size_ = 10 and maybe_arg_type set to the
Type corresponding to int32. Built-in types all have a static Shape() method that
returns the Typeshape given the parameters for an instance of that type. User defined types (e.g.
structs or unions) will all have a type of IdentifierType - the corresponding
TypeDecl, like Struct provides the static Shape() method instead.
TypeDecl
See Decl
TypeShape
Information about how objects of a type should be laid out in memory, including their size, alignment, depth, etc.
Typespace
The typespace is a map from Type names to a TypeTemplate for that Type. During
compilation, the typespace is initialized to include all of the built-in types (e.g. "vector"
maps to VectorTypeTemplate), whereas user-defined types get added during the compilation process.
This also ensures that a single type template instance exists per type and avoids name
collisions/shadowing of types (e.g. https://fxbug.dev/42156366).
TypeTemplate
Instances of TypeTemplates provide a Create() method to create a new instance of a specific
Type - therefore there is a TypeTemplate subclass for each built-in FIDL type (e.g.
ArrayTypeTemplate, PrimitiveTypeTemplate, etc.) as well as a single class for all user defined
types (TypeDeclTypeTemplate), and one for type aliases (TypeAliasTypeTemplate). Create()
takes as parameters the possible type parameters: argument type, nullability, and size.
For example, to create an object representing the type vector<int32>:10? the compiler would call
the Create() method of the VectorTypeTemplate with an argument type of int32, max size of
10, and a nullability of types::Nullability::kNullable. This call returns an instance of a
VectorType with those parameters. Note that not all 3 of these parameters apply to all of the
types (e.g. PrimitiveTypes, like int32 have none of the 3). The Create() method of the type
template for each type automatically checks that only the relevant parameters are passed.
The concrete type for user defined types is the IdentifierType, which gets generated by the
TypeDeclTypeTemplate.
Virtual SourceFile
A subclass of SourceFile that has a fake "filename" and is initialized with no backing data. It
exposes an AddLine() method to add data to the file, and is used as a backing SourceFile for
content that does not appear directly in any of the input SourceFiles, like for generated
anonymous Names.