RFC-0151: Compiler tuning flags for CPU targeting | |
---|---|
Status | Accepted |
Areas |
|
Description | Proposed changes to compiler flags that govern CPU targeting, and their effects on the platform and SDK build. |
Issues | |
Gerrit change | |
Authors | |
Reviewers | |
Date submitted (year-month-day) | 2022-01-04 |
Date reviewed (year-month-day) | 2022-02-02 |
Summary
Proposed changes to compiler flags that govern CPU targeting, and their effects on the platform and SDK build.
Motivation
Fuchsia's build produces many artifacts that comprise executable machine code for different architectures. For instance: prebuilt shared libraries that are published for SDK use, or executable C/C++ or Rust binaries that are included in platform and product system images. When generating machine code in a compiler, it's important to indicate the following:
Target architecture: What instruction set architecture to use? For instance x86-64 or AArch64 ISA. Furthermore the compiler may target revisions of an ISA. Revisions may offer additional instructions or variations on existing instructions, such as new floating point or SIMD instructions, or wider atomic memory operations, that can significantly improve performance. It is critical to know the target architecture to produce code that is guaranteed to work on targeted devices. "Raising" the target architecture unlocks new features at the cost of backward compatibility with older hardware.
Above: progression of ARM ISA (source)
Target micro-architecture: How is the ISA implemented? This is typically expressed in terms of whether instructions are executed in-order or out-of-order, decoding bandwidth, cache load latencies, etc. Specifying a target micro-architecture allows the compiler to produce machine code that may run faster on targeted hardware, without constraining hardware compatibility.
Above: Intel Core 2 micro-architecture block diagram (source)
Compilers allow users to specify these targets in ways that we will review later. Fuchsia's build system is able to configure compilers on a global or per-binary basis. The existing implementation of CPU targeting in Fuchsia's build has several shortcomings, which this RFC aims to address:
The choice of a baseline configuration for arm64 targets a specific CPU (Cortex-A53), rather than an ISA plus features used. This makes for a poorly defined platform baseline.
It's unclear how to set overrides for this baseline due to lack of prior art and no documented policies or best practices. As a result, Fuchsia builds all but fall back to the baseline, including build configurations that target hardware that's explicitly outside the baseline.
These shortcomings have been present since the build's inception in 2016, and before that in prior art that Fuchsia evolved from. The present system was to ship Fuchsia on a first device (which happens to use the same micro-architecture as the present-day platform arm64 baseline).
Recent developments indicate that it's time for an overhaul. Specifically, in addition to the Astro and Sherlock board configurations that target Cortex-A53, Fuchsia now supports the Nelson board configuration (Cortex-A55) and the Atlas board configuration (Intel Amber Lake). However these builds are currently not configured to take advantage of the differences between the baseline and the actual target.
Additionally, there is growing interest in refining the definition of the platform's hardware baseline, or in increasing it. A clearer definition of the baseline configuration and of specific board configurations would accelerate related efforts. See also:
- RFC-0073: Raising x86-64 platform requirement to x86-64-v2
- RFC-0111: Initial Fuchsia hardware platform specifications
- RFC-0130: Supported Hardware
To meet present and future challenges, this RFC proposes immediate-term changes to be made to CPU targeting in the build, as well as mechanisms and policies to govern targeting in the foreseeable future.
Stakeholders
Facilitator: cpu@google.com
Reviewers:
aaronwood@google.com
- System assemblydigit@google.com
- Buildmcgrathr@google.com
- Kernelmvanotti@google.com
- Securitymaniscalco@google.com
- Kernelphosek@google.com
- Toolchaintravisg@google.com
- Kernel
Consulted:
Since the proposed changes affect much of the platform, all parties are encouraged to self-appoint as consulted. I would especially welcome feedback from teams such as Graphics, Media, and SDK.
Socialization:
The highlights of this proposal was first reviewed as a 60 minutes presentation and open discussion at Fuchsia's Kernel Evolution Working Group.
Background
Compile-time tuning flags
Fuchsia uses clang to compile C/C++, with some subsets of Fuchsia code also continuously building and testing with gcc. Both tools offer the following flags:
-march
: Sets the target architecture, for instance x86-64-v2
(our
current x64 baseline per RFC-0073) or ARMv8-A. Optionally also specified
additional architecture features, for instance +avx2
to indicate Intel Haswell
extensions that are greater than the x64 baseline.
-mtune
: Sets the target microarchitecture, for instance cortex-a53
or
haswell
. When neither -mtune
nor -mcpu
is used, then this value is set to
generic
to target a balance across a range of targeted CPUs.
-mcpu
: Sets the target CPU. Accepts similar values to -mtune
. For ARM
CPUs this is equivalent to setting the target architecture (-march
) and target
microarchitecture (-mtune
) to match the target CPU. On x86 this is considered
deprecated, and the value given is redirected to -mtune
.
The Rust compiler offers codegen options as follows:
target-cpu
: Similar to -mcpu
, accepts for instance cortex-a53
.
target-features
: Similar to -march
features, for instance +avx2
.
Present state
Currently all x64 builds are compiled with -march=x86-64-v2
, and all ARM
builds are compiled with -mcpu=cortex-a53
.
There exists a mechanism for overriding this configuration via a GN argument
named board_configs
, which can be overridden by a board configuration in a
.gni
file. Some boards, specifically Astro and Sherlock, manually specify the
Cortex-A53 configuration described above, though this is currently a no-op as
the same configuration also serves as the fallback if no override is defined.
Most board configurations do not set board_configs
.
Tuning objectives and tradeoffs
This section briefly reviews different objectives to consider when setting CPU targeting options, and some of the tradeoffs between them.
Hardware compatibility: Targeting an earlier revision of an ISA unlocks compatibility with older hardware. Greater compatibility comes at the cost of losing access to new ISA features that can have performance or security benefits.
Performance: New instructions can deliver performance gains: faster or wider atomic operations, accelerated math (FPU, SIMD improvements), built-in accelerators for common algorithms (such as CRC and AES). Tuning machine code for a given CPU can produce code that runs faster on the target CPU, though often at the expense of performance on other CPU that are outside the target parameters.
Interplay with binary size: tuning has been observed to increase binary size under some circumstances, for instance when instruction scheduling optimizations targeting in-order processors increase register pressure.
Binary size: Some codegen features are unlocked with certain CPU features. For instance SIMD enables auto vectorization, which has a similar effect to loop unrolling in that it generates code that is faster but also larger. Instruction scheduling tuned for in-order CPUs tends to generate larger code because it adds more scheduling constraints and can increase register pressure and register spilling.
Other codegen features can decrease binary size. For instance replacing algorithms such as CRC and AES with specialized instructions produces code that is both faster and smaller.
Ease of troubleshooting i.e. binary diversity: Tuning for different CPUs means producing more binary variants of the same logical artifacts over time. For instance, multiple "flavors" of the kernel image or of prebuilt shared libraries, each optimized for a different target. This can make reproducing issues more complex, or expose Fuchsia to issues that manifest in some binary variants but not others.
Level playing field: In addition to baseline builds, Fuchsia may offer SDK prebuilts (system image, redistributable shared libraries) that are tuned to particular CPUs. Doing so affords a narrow privilege to some hardware choices over others. It's reasonable to assume that creating SDK flavors that are tuned to some CPUs will create a future expectation of offering more tuned SDK release channels.
Simplicity: All of the above adds complexity to understanding Fuchsia, to developing on Fuchsia, and to maintaining Fuchsia. The tradeoffs expressed above are for setting CPU targeting options to introduce binary diversity where it is already feasible, on build and distribution channels that are intended for specific hardware such as build and release pipelines that target OTA channels for specific user hardware. At the time of this writing there simply isn't a mechanism for system or package delivery that can offer multiple binaries to different target hardware, matching the right binary to the right device.
Proposal
The immediate proposed modifications can be seen in this change. Additional explanation is given below.
New arm64 baseline hardware target
The current baseline for arm64 is defined as targeting Cortex-A53, as follows:
-mcpu=cortex-a53
This is technically equivalent to expressing -march
in terms of a precise set
of Cortex-A53 features, and tuning codegen for Cortex-A53.
-march=<armv8a + Cortex-A53 features>
-mtune=cortex-a53
Instead, the baseline will express the ARMv8-A ISA features that are actually exercised by the platform and are therefore assumed as baseline, then tune codegen for a generic armv8a CPU.
-march=armv8-a+simd+crc+crypto
-mtune=generic
The effect on -march
is effectively a no-op, since removing -march
features
that are supported in Cortex-A53 but not exercised in the code is a no-op.
The effect on -mtune
is minimal or none, since a generic tuning target
optimizes for a typical in-order ARMv8-A CPU, such as Cortex-A53.
Changes to existing x64 baseline hardware target
The current baseline for x64 is:
-march=x86-64-v2
This subject was previously covered in the above-mentioned RFC-0073: Raising x86-64 platform requirement to x86-64-v2.
This will be change to the flag set below:
-march=x86-64-v2
-mtune=generic
This is not a behavior change, since -mtune
defaults to generic
when
neither -mtune
nor -mcpu
is specified, as previously explained. However,
adding -mtune=generic
makes this behavior explicit and is consistent with the
definition for the arm64 baseline.
Board-specific configuration
The board_configs
board argument, specified in board-specific .gni
files
(such as those found in //boards/
), will continue to be used to override the
baseline configuration with a board-specific configuration.
Specifically, board configurations such as astro.gni
and sherlock.gni
, which
use Cortex-A53, will continue to target Cortex-A53 and will keep the current
-mcpu=cortex-a53
configuration.
Essentially this RFC is taking the existing arm64 configuration that targets Cortex-A53 and extracts it from the platform baseline to the board-specific configurations for Astro and Sherlock boards that carry such CPUs. Then, this RFC redefines the platform baseline in ARM ISA terms that generalize to many hardware choices, rather than in terms of a single ARM CPU.
Additionally, it's possible to add support for optimizations targeting different architecture variants (such ARM Cortex-A73 or Intel AVX extensions) in future releases of the SDK. This merits further discussion and is out of scope.
Kernel configuration
The board_configs
argument will no longer apply to the kernel image. This is
for the following reasons:
Newer instructions or other CPU features that need to be known at codegen time don't currently present benefits to the kernel.
Micro-architecture based tuning of kernel code doesn't present a benefit to counter the cost of increasing binary diversity and complexity.
The kernel can continue to provide information about supported hardware
capabilities, such as with the
zx_system_get_features
syscall.
Additionally, the kernel can still take advantage of some newer hardware
features such as 64kB memory pages which don't require generating different
code, only querying for the presence of these features at runtime. If a new
feature such as this is introduced that requires board-specific configuration
then it's easy to introduce a new argument kernel_board_configs
that defines
the associated flags.
Backwards Compatibility
The immediate changes proposed in this RFC do not raise Fuchsia's minimum hardware requirements, so there is no impact to backwards compatibility. Future changes that do raise the minimum requirements may be aided by the policies that this RFC advocates for.
Security considerations
Fuchsia uses or intends to use several CPU features that improve security or support the use of sanitizers (which then improve security). These are generally not controlled by the compiler flags discussed here and so are not of concern.
Notably:
- Userspace Top-Byte-Ignore (see also RFC-0143 is supported universally across AArch64.
- New instructions that improve sanitizer support (such as ARM MTE) or mitigate vulnerabilities by ensuring control flow integrity (such as Pointer Authentication and Branch Target Identification or Intel Control-flow Enforcement Technology) are in the NOP-space, making them backwards-compatible (in the sense that older CPUs execute them as NOPs). As such using these instructions doesn't require raising the minimum supported ISA for the platform or for an SDK flavor.
Testing
Correctness: Changes to CPU targeting should never compromise correctness. This is verified with continuous presubmit and postsubmit testing. The present systems are sufficient to ensure this.
Performance: Changes to CPU targeting often have performance implications. Fuchsia's Perfcompare system will be used to validate any such changes, as has been the case before.
Binary size: Changes to CPU targeting often affect binary size in subtle ways. Specifically, Fuchsia is currently tracking the size of the Astro image most closely, since this is the most constrained target that we have. The immediate changes do not regress this size. Future changes that affect specific product images can be reviewed and carefully considered on the tradeoffs by the owners of those product definitions.
Drawbacks, alternatives, and unknowns
CPU targeting presents many engineering and business tradeoffs among sometimes-conflicting objectives. These are reviewed above. Future changes that shift these tradeoffs and future tuning opportunities and considerations are outside the scope of this RFC.