Introduction
LLVM's shadow-call-stack feature is a compiler mode intended to harden the generated code against stack-smashing attacks such as exploits of buffer overrun bugs.
The Clang/LLVM documentation page linked above describes the scheme. The capsule summary is that the function return address is never reloaded from the normal stack but only from a separate "shadow call stack". This is an additional stack, but rather than containing whole stack frames of whatever size each function needs, it contains only a single address word for each call frame it records: just the return address. Since the shadow call stack is allocated independently of other stacks or heap blocks with its own randomized address to which pointers are rare, it is much less likely that some sort of buffer overrun or use-after-free exploit will overwrite a return address in memory so that it can cause the program to return to an instruction by the attacker.
The shadow-call-stack and safe-stack instrumentation schemes and ABIs are related and similar but also orthogonal. Each can be enabled or disabled independently for any function. Fuchsia's compiler ABI and libc always interoperate with code built with or without either kind of instrumentation, regardless of what instrumentation was or wasn't used in the particular libc build.
Interoperation and ABI Effects
In general, shadow-call-stack does not affect the ABI. The machine-specific
calling conventions are unchanged. It works fine to have some functions in a
program built with shadow-call-stack and some not. It doesn't matter if
combining the two comes from directly compiled .o
files, from archive
libraries (.a
files), or from shared libraries (.so
files), in any
combination.
While there is some additional per-thread state (the shadow call stack
pointer, see below), code not using
shadow-call-stack does not need to do anything about this state to keep it
correct when calling, or being called by, code that does use safe-stack. The
only potential exceptions to this are for code that is implementing its own
kinds of non-local exits or context-switching (e.g. coroutines). The Zircon C
library's setjmp
/longjmp
code saves and restores this additional state
automatically, so anything that is based on longjmp
already handles everything
correctly even if the code calling setjmp
and longjmp
doesn't know about
shadow-call-stack.
For AArch64 (ARM64), the x18
register is already reserved as "fixed" in the
ABI generally. Code unaware of the shadow-call-stack extension to the ABI is
interoperable with the shadow-call-stack ABI by default if it simply never
touches x18
.
The feature is not yet supported on any other architecture.
Use in Zircon & Fuchsia
Zircon on Aarch64 (ARM64) supports shadow-call-stack both in the kernel and
for user-mode code. This is enabled in the Clang compiler by the
-fsanitize=shadow-call-stack
command-line option. For aarch64-fuchsia
(ARM64) targets, it is enabled by default. To disable it for a specific
compilation, use the -fno-sanitize=shadow-call-stack
command-line option.
As with safe-stack, there is no separate facility for specifying the size of
the shadow call stack. Instead, the size specified for "the stack" in legacy
APIs (such as pthread_attr_setstacksize
) and ABIs (such as PT_GNU_STACK
) is
used as the size for each kind of stack. Because the different kinds of
stack are used in different proportions according to the particular program
behavior, there is no good way to choose the shadow call stack size based on
the traditional single stack size. So each kind of stack is as big as it might
need to be in the worst case expected by the tuned "unitary" stack size. While
this seems wasteful, it is only slightly so: at worst one page is wasted per
kind of stack, plus the page table overhead of using more address space for
pages that are never accessed.
Implementation details
The essential addition to support shadow-call-stack code is the shadow call stack pointer. This is a register with a global use, like the traditional stack pointer. But each call frame pushes and pops a single return address word rather than arbitrary data as in the normal stack frame.
For AArch64 (ARM64), the x18
register holds the shadow call stack pointer at
function entry. The shadow call stack grows upwards with post-increment
semantics, so x18
always points to the next free slot. The compiler never
touches the register except to spill and reload the return address register
(x30
, aka LR). The Fuchsia ABI requires that x18
contain a valid shadow
stack pointer at all times. That is, it must always be valid to push a
new address onto the shadow call stack at x18
(modulo stack overflow).
Notes for low-level and assembly code
Most code, even in assembly, does not need to think about shadow-call-stack
issues at all. The calling conventions are not changed. All use of the stack
(and/or the unsafe stack) is the same with or without
shadow-call-stack; when frame pointers are enabled, the return address will
be stored on the machine stack next to the frame pointer as expected. For
AArch64 (ARM64), function calls still use x30
for the return address as
normal, though functions that clobber x30
can choose to spill and reload it
using different memory. Non-leaf functions written in assembly should ideally
make use of the shadow-call-stack ABI by spilling and reloading the return
address register there instead of on the machine stack.
The main exception is code that is implementing something like a non-local
exit or context switch. Such code may need to save or restore the shadow call
stack pointer. Both the longjmp
function and C++ throw
already handle
this directly, so C or C++ code using those constructs does not need to do
anything new.
New code implementing some new kind of non-local exit or context switch will
need to handle the shadow call stack pointer similarly to how it handles the
traditional machine stack pointer register and the unsafe stack
pointer. Any such code should use #if __has_feature(shadow_call_stack)
to
test at compile time whether shadow-call-stack is being used in the particular
build. That preprocessor construct can be used in C, C++, or assembly (.S
)
source files.