RFC-0159: Execute Only Memory | |
---|---|
Status | Accepted |
Areas |
|
Description | Support for mapping execute-only memory. |
Issues | |
Gerrit change | |
Authors | |
Reviewers | |
Date submitted (year-month-day) | 2022-03-29 |
Date reviewed (year-month-day) | 2022-05-10 |
Summary
This document proposes changes to kernel APIs to support binaries with
execute-only segments, by adding a new feature check in
zx_system_get_features
and changing the launchpad
and process_builder
loaders as well as the dynamic linker in Fuchsia's in-tree libc to support '--x'
segments. It lays out a plan for eventual kernel support for mapping
execute-only pages on hardware that supports it.
We don't typically need to read executable memory after it has been loaded. Enabling execute-only code by default increases security of Fuchsia’s userspace processes and furthers the engineering best practice of least permissions.
Motivation
Support for execute-only pages was added to ARM MMUs in ARMv7m and allows pages of memory to be mapped such that they are only executable and not readable or writable. Though writable code pages have been considered a security threat for a long time, allowing code to remain readable has been shown to expose applications to needless risk. Specifically, reading code pages is often a first step in an attack chain, and preventing code from being read hinders adversaries. See Readable Code Security. Moreover, supporting execute-only pages fits well with Fuchsia’s permissions model and more strongly aligns with the principle of least privilege: often code doesn’t need to be read, but just be executed.
Stakeholders
Facilitator:
- cpu@google.com
Reviewers:
- phosek@google.com
- mvanotti@google.com
- maniscalco@google.com
- travisg@google.com
Background
Execute-only Memory
Execute-only memory (XOM) describes memory pages that have neither read nor write permissions and can only be executed. ARMv7m and above have native support for XOM, however there are some considerations on older ISA’s. Discussed further in XOM and PAN.
This doc focuses almost exclusively on AArch64, however the implementation is architecture agnostic. When hardware and toolchain support matures for other architectures, they would all easily be able to take advantage of execute-only support in Fuchsia.
Permissions of Code Pages
Initially, computers supported direct memory access to physical memory without any checks or protections. The introduction of MMUs provided a key abstraction, in the form of virtual memory, by decoupling a program's view of memory from the underlying physical resources. This facilitated a more flexible, safe, and secure programming model by allowing OS implementers to provide strong isolation between their programs via the process abstraction. Today's MMUs provide a number of critical facilities, such as paged memory, fast address translation, and permission checking. They also allow users significant control over how memory regions can be accessed and used, via the page permissions that typically control if memory pages can be read, written to, or executed. This is a key property for program safety, fault isolation, and security, since it restricts a program's ability to misuse system resources through hardware enforced permission checks.
Memory that is both writable and executable is particularly dangerous because it
provides an easy way for an adversary to achieve arbitrary code execution
through common vulnerabilities, like buffer overflows. For this reason, many OS
configurations explicitly disallow pages to be both writable and executable
(W^X). This has been the standard for over a decade, OpenBSD added support for
W^X in 2003 with OpenBSD 3.3 openbsd-wxorx. See also SELinux W^X policies
selinux-wxorx. Writable code can be useful for things like just-in-time (JIT)
compilation, which writes executable instructions to memory at runtime. Having
W|X pages can be disallowed and JIT’s need to work around this. An easy way is
to write code to non-executable pages and later change the page protections,
i.e., through mprotect
or zx_vmar_protect
, to be executable but not writable
example-fuchsia-test. In nearly all cases pages that are W|X are too
permissive. Similarly, executable pages rarely ever need to be read See
exceptions. Allowing read operations on executable pages is
generally unnecessary and should not be the default.
Readable Code
Because of ARM’s fixed instruction width, immediate values have size
constraints. For this reason loads are done using PC-Relative addressing. To get
around this, the pseudo instruction ldr Rd, =imm
will emit imm
in literal
pools close to the code loading it. This is incompatible with XOM because it
puts data in the text section which must be readable. When searching for use of
literal pools in the codebase to ensure we don’t read executable segments, we
have found some usages of ldr Rd, =imm
in Zircon, but all has since been
removed. Clang will not use literal pools for aarch64, instead it will emit
multiple instructions to create a large immediate. Clang has a -mexecute-only
flag and alias -mpure-code
but these are only meaningful on arm32 because
these flags are inherent when targeting aarch64.
Example: Large Intermediates
This example shows how Clang compiles this C code to assembly given different targets clang-example. The top row shows aarch64, and the bottom shows arm32:
uint32_t a() {
return 0x12345678u;
}
# -target aarch64
a:
mov w0, #22136
movk w0, #4660, lsl #16
ret
# -target arm
a:
ldr r0, .LCPI0_0
bx lr
.LCPI0_0:
.long 305419896
XOM and PAN
Privileged access never (PAN) is a security feature on ARM chips that prevents
normal memory access to user pages from kernel mode. It helps protect against
potential kernel vulnerabilities because the kernel cannot touch user memory
with a normal load or store instructions. Instead the OS would need to turn PAN
off or use the ldtr
and sttr
instructions for accessing those pages. PAN is
not currently enabled for Fuchsia, but there are already plans to support it in
zircon pan-fxb.
Aarch64 page table entries have 4 relevant bits to control page permissions. 2 bits are used for user and privileged execute-never. The remaining two are used to describe read and write page permissions for both access levels. An execute-only mapping has both read and write access removed but allows user execution.
This table from the ARMv8 Reference Manual shows the possible memory protections using the only 4 available bits. EL0 is the exception level for userspace. Rows 0 and 2 show how to create userspace execute-only pages. See Table D5-34 Stage 1 from the ARMv8 Reference Manual.
UXN | PXN | AP[2:1] | Access from a higher Exception level | Access from EL0 |
---|---|---|---|---|
0 | 1 | 00 | R, W | X |
0 | 1 | 01 | R, W | R, W, X |
0 | 1 | 10 | R | X |
0 | 1 | 11 | R | R, X |
Unfortunately, PAN’s algorithm for deciding if a page should not be privileged accessible checks if the page is user-readable. From the perspective of PAN, a user-execute-only page looks like a privileged mapping. This allows the kernel to access user memory where it otherwise should not, thereby bypassing PAN’s intended purpose and making PAN and XOM incompatible pan-issue. This would make any future usage of PAN not useful against attacks trying to exploit the kernel touching user memory, however it would still be useful for detecting kernel bugs.
This problem caused both Linux and Android to drop support for XOM. This was particularly noticeable for Android who dropped support indefinitely in Android 11 after being added and made the default for all aarch64 binaries in Android 10 linux-revert. They plan to re-enable the feature as hardware which fixes the problem becomes more ubiquitous but there is no concrete time frame when it will be readded.
ARM has since proposed a solution with “enhanced” PAN or ePAN, which changes PAN to check not just if a page is user readable but also not user executable. Unfortunately, hardware with the feature may not be on any Fuchsia-targeted devices for years. Linux has since re-added their implementation of XOM after ePAN was made linux-re-land. Support for ePAN on devices is out of our control and the incompatibility with PAN and XOM should not block the kernel’s implementation of PAN See more.
From figure 2, there is no possible configuration where read permission can be stripped from the kernel. The only exception is PAN, which can cause an exception when the kernel tries to touch a user-readable page. For this reason, it is not possible to create an execute-only mapping for the kernel, since the kernel cannot mark a page executable at EL1 but not readable. Thus, it is only possible to create an execute-only mapping for userspace processes.
Targeting XOM Hardware
Segment permissions in ELF indicate what permissions the code requires to run correctly. In other words, software doesn’t need to know at build time if the hardware it will run on can support XOM or not. Instead, it should unconditionally use XOM if it will not need to read code pages. It is up to the OS and loaders to enforce those permissions to the greatest extent the system allows elf-segment-perm.
Virtual Memory Permissions
POSIX specifies that mmap
may permit read access to pages where PROT_READ
has not been explicitly set posix-mmap. Both Linux and macOS on x86, and macOS
on M1 chips, will not fail when requesting pages from mmap with just PROT_EXEC
and instead make the pages PROT_READ | PROT_EXEC
. These implementations have
syscalls which are “best-effort” in their ability to honor a user's requests.
Fuchsia syscalls, on the other hand, are always explicit in what they can and
cannot honor. The zx_vmar_*
syscalls do not silently escalate permissions of
pages like their POSIX counterparts are permitted to by the standard. Requesting
pages without ZX_VM_PERM_READ
will currently always fail as the hardware and
the OS do not support mapping pages without read permissions. A graceful
transition to supporting binaries with execute-only segments and userspace
programs which allocate execute-only memory will require a way to check if the
OS can map execute-only pages prior to requesting them.
Readable Code Security
Many attacks rely on finding out information about the process through reading code pages to find “gadgets”, or executable code of interest. Address space layout randomization (ASLR) is a technique used by operating systems to load binary segments at semi-random places in the process's address space. It is used by Fuchsia and many other OS to hinder attacks which rely on knowing where code or other data is in memory. Making code unreadable further reduces the attack surface.
Code reuse attacks, like “return-to-libc” rtl-attack, are used to return control of a function to a known address. libc is a logical choice to return or jump into since it contains rich functionality useful to an attacker, and because it is extremely likely the process will link against libc. It has been demonstrated that the available gadgets in a typical program are Turing-complete, giving an adversary the ability to execute arbitrary code.
In many cases an adversary's objective is to obtain a shell. ASLR makes these kinds of attacks harder because the addresses of functions are different between invocations of a program. However, ASLR isn’t a comprehensive mitigation, because attackers can read code pages to find the address of functions that they would otherwise not know by looking at their address in the binary. XOM makes it impossible for ASLR to be broken in this way and attackers will need to use another way to find out information about the location of specific code pages.
Common Notation
‘rwx/r-x/–x’
These represent permissions of ELF segments, which get mapped into the processes
address space with the corresponding permissions. This notation is used commonly
both when describing permissions of files, as well as ELF segments by tools like
readelf
. r, w and x mean read, write and execute respectively and ‘-’ means
the permission is not granted. An execute-only segment will have ‘--x’
permissions.
R^X, W|X, etc…
As above, R, W and X refer to read, write and execute. ‘^’ and ‘|’ are C-like operators for xor and or. R^X is read as “read xor execute”.
"ax"
This is assembler syntax which marks a section as allocated and executable.
Currently linkers will put “ax” sections into segments that are ‘r-x’. The
--execute-only
flag in lld will mark these segments as ‘--x’ instead.
Design
To increase security of our userspace programs by supporting XOM, both our toolchain and loaders will need to be updated. The clang driver will need to pass the ‘--execute-only’ flag to the linker to ensure “ax” sections which would otherwise be mapped to ‘r-x’ segments are instead mapped to ‘--x’ segments. The loaders will also need to change the sanity checks that all requested permissions contain at least read, because this will no longer be true.
As it will only be possible to use XOM on hardware that has ePAN, we will need to gracefully support the transition. We have two options:
- Change
vmar_*
functions to be best effort like manymmap
implementations - Create a way to query the kernel if it supports execute-only mappings and have the loader escalate permissions of a ‘--x’ segment to ‘r-x’ if XOM is not available.
- Add a new
ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
flag for loaders to use with ‘--x’ segments.
In all cases, there will be a potential silent escalation of permissions. The first option would be the easiest, the loaders would need no changes other than removing their sanity checks. The second option is not significantly more complex, it just would add a simple check in the loaders before deciding what memory permissions to request from the OS. The third option is helpful because it is less error prone in user code.
The first option would end up breaking Fuchsia’s current strict contract with userspace of always being explicit about what a syscall can and cannot honor. The 2nd and 3rd option also end up with ambiguous handling of memory permissions when loading ELF files. However this fits within the ELF specification. Segment permissions don’t specify 1:1 what permissions the memory allocated for a segment will have, but rather which permissions the memory must at least have for the program to operate correctly. ELF loaders are within their rights to map a ‘--x’ segment into ‘r-x’ memory elf-segment-perm.
The first option of breaking Fuchsia’s current contract of explicit syscall handling isn’t ideal. Both option 2 and 3 have value and the implementation proposed in this RFC will be based on both options.
Implementation
System Call Additions
A new flag ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
will be added which will make
the various zx_vmar_*
syscalls which take a permissions flag in options
which will implicitly add read permission if XOM is not supported.
ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
is logically only useful with
ZX_VM_PERM_EXEC
and not ZX_VM_PERM_READ
, however the various syscall which
accept this flag will not be treating this as an invariant. It is safe to have
ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
with any other combination of flags, it
will just be treated as ZX_VM_PERM_READ
in contexts where the system
cannot map execute-only pages.
A new kind
value ZX_FEATURE_KIND_VM
will be added for
zx_system_get_features
, which will yield a bitset similar to
ZX_FEATURE_KIND_CPU
. There will also be a new feature
ZX_VM_FEATURE_CAN_MAP_XOM
. The current implementation will always keep this
bit false because XOM will not be enabled until later. This will not be used by
the loaders because ‘r-x’ memory permissions are valid for a ‘--x’ segments, but
is still important for userspace to be able to query for this functionality.
System Loader ABI Changes
Current and future loaders will ensure '--x' segments can be loaded into memory
even if the target can't support XOM. The loaders will add
ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
when mapping execute-only segments.
Shipped Dynamic Linker ABI Changes
Similarly, the dynamic linker in Fuchsia’s libc shipped with the SDK will also
escalate permissions where necessary when allocating memory for ‘--x’ segments
with ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
.
Compiler Toolchain Changes
The clang driver will also be changed to always pass --execute-only
to the
linker when targeting aarch64-*-fuchsia
. We will also need a way to opt out of
this behavior, most likely by adding a new ‘--no-execute-only’ flag to the
linker, so programs can easily opt out of the new default behavior.
Kernel XOM Implementation
Once hardware arrives that supports ePAN, the kernel can service a request for
memory pages to have just ZX_VM_PERM_EXECUTE
. The arm64 user-copy
implementation may need updates to ensure it's consistent with how user memory
access is constrained. user_copy
should be updated to use the ldtr
and
sttr
instructions. This will ensure that users cannot trick the kernel to read
unreadable pages for them. Moreover, the kernel makes assumptions about mappings
being readable in a couple of places and these will need to be changed where
appropriate. This work will be done later.
Unnecessary Changes
zx_process_read_memory
does not need to be changed, and debuggers should work
normally when debugging execute-only binaries. zx_process_read_memory
ignores
the permissions of the pages it is reading from, and only checks that the
process handle has ZX_RIGHT_READ
and ZX_RIGHT_WRITE
.
zx_vmar_protect
will continue to work as it does currently. Most notably this
means that processes can protect their code pages with read permission in cases
where that is necessary.
Performance
There is no expected impact in performance.
Security
Until XOM is implemented in the kernel a binary with ‘--x’ segments will be just as secure as an equivalent binary using ‘r-x’ segments. Once XOM is supported both by hardware and the OS, programs which elect to use execute-only memory will become more secure. See sections Permissions of Code Pages, XOM and PAN and Readable Code Security.
Privacy
No extra considerations other than those mentioned in Security.
Testing
zx_system_get_features
will have trivial testing when we are forcing XOM
support in the kernel where we can know at build time what we expect the
syscall to return.
The ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
will be tested that it makes a page
readable when it is reported by zx_system_get_features
that the OS cannot
create execute-only pages.
Likewise, the elfload library doesn't have any real testing, save for fuzz tests which don't test expected functionality. Instead its functionality is inherently tested by other components that rely on it. Testing should be added here to ensure '--x' segments are correctly mapped. The process_builder library does have tests, and these will ensure it properly requests readable and executable memory when XOM is not available.
The changes to the current dynamic linker will not be tested directly. A new dynamic linker is planned and it will have extensive testing, including testing of ‘--x’ segments.
The changes to the clang driver will have testing in upstream LLVM.
We will also set up testing configuration for enabling XOM on test bots, even if that hardware does not have ePAN and we would otherwise not enable XOM. This will help us catch in tree programs that read their code pages and need to opt out of execute-only.
Documentation
The changes to zx_system_get_features
will be documented, as well as the
motivation for why user space would want to query with the kind
ZX_VM_FEATURE_CAN_MAP_XOM
. Likewise the new
ZX_VM_PERM_READ_IF_XOM_UNSUPPORTED
flag will also be documentated. Changes to
the various loaders and the clang driver defaults will not be documented outside
of this RFC.
Drawbacks, Alternatives, Unknowns
It is unknown how much current and future out of tree code relies on executable code being readable. This could be from use of data constants in text from handwritten assembly, code compiled from other toolchains or program introspection. Regardless, programs which need to have readable code pages, will still benefit because their shared library dependencies, including libc, will be marked execute only. Changing our clang toolchain to default to execute-only segments will break programs which depend on readable code. There is no easy way to check at build time if a program relies on this behavior. However once it is identified that a program needs ‘r-x’ segments, opting out of the default ‘--x’ will be simple.
For programs which need to be able to read some of their code but not all,
current tooling cannot easily support this. The --execute-only linker
flag
will strip read permissions from any executable segment, and there is no way to
mark a single section as needed to be read. Programs which want this behavior
will need to opt out of execute-only completely.
Risks
It is possible that the clang driver defaults to using --execute-only
and code
that reads from a ‘--x’ segment won’t be broken until hardware and kernel
support for XOM lands. This creates potential forward compatibility problems for
software that didn’t change. Testing will exist for in tree software, but most
likely not for out of tree code.
Prior Art and References
Because of the ambiguous handling of mmap
permission flags in many POSIX
implementations, they have no need for an analogue to
zx_system_get_features(ZX_FEATURE_KIND_CAN_MAP_XOM, &feature)
.
Darwin supports XOM on newer Apple chips, but their implementation is more robust using proprietary hardware features. Their chips have hardware support for stripping individual permission bits from both kernel and user memory. It is not enabled for userspace in macOS. apple-xom