|RFC-0013 - Cloning a VMO mapping|
A system call that creates a CoW clone of a VMO from a mapping, without needing a handle to it.
|Date submitted (year-month-day)||2020-10-26|
|Date reviewed (year-month-day)||2020-12-09|
It is currently only possible to create a CoW clone of a VMO if you have a handle to it. This is not sufficient for zygote use cases, which requires CoW cloning an entire address space, including mappings for which all VMO handles have been closed. This RFC proposes a new system call to address this gap.
Motivation and problem statement
Chromium on Linux spawns renderers by forking them from a zygote process, resulting in significant memory and CPU savings. We would like to realize these savings on Fuchsia.
To do this, we need some way to implement an "address space clone" operation, which when given:
- A process root VMAR
- Some handles that refer to VMOs mapped in the address space
- A new process with an address space populated by clones of the VMOs mapped in the input process
- For each handle referring to a VMO in the input process, a new handle referring to the corresponding clone of that VMO in the new process
By adding this system call, we can comfortably implement this in userspace.
This works by getting the address space layout with
zx_vmar_create_vmo_child to create a clone of each mapping,
except for any VMOs that there is a handle to,
which are handled using
zx_status_t zx_vmar_create_vmo_child(zx_handle_t handle, uint32_t options, zx_vaddr_t addr, size_t size, zx_handle_t* out)
Creates a CoW clone of the pages mapped in the range
in the VMAR referenced by
The range must be a subrange of a single VMO mapping,
i.e. it may not span two mappings or include any unmapped pages.
The result is a new VMO that is a child of the VMO mapped in the range.
Allowed options are
They are interpreted the same way as in
ZX_VMO_CHILD_SLICE is not currently allowed due to security concerns:
special care would have to be taken to ensure that
the ability to modify the parent VMO doesn't break any security boundaries.)
- the range
size isn't a subrange of a single VMO mapping
size is not page-aligned
- the mapping was done using a new flag
or the vmar options include this flag
handle doesn't have
handle isn't a valid handle,
handle isn't a VMAR,
ZX_ERR_BAD_STATE if the VMAR is destroyed.
Can also return any error returned by
zx_vmo_create_child if the documented reason applies.
The returned VMO handle will have
ZX_DEFAULT_VMO_RIGHTS, with the following changes:
ZX_RIGHT_WRITE will be removed if
ZX_VMO_CHILD_NO_WRITE was specified.
ZX_RIGHT_EXECUTE will be added if the mapping was executable and
ZX_VMO_CHILD_NO_WRITE was not specified.
This means it is be possible to create executable clones of executable mappings,
but they must be read-only.
ZX_VM_CANNOT_CREATE_VMO_CHILD is a new
which can be specified for a VMO mapping or a VMAR.
This allows code that maps VMOs or creates VMARs
zx_vmar_create_vmo_child on them.
Adding a syscall is not a complex change; it can be done in one CL. Implementation of the clone algorithm is beyond the scope of this RFC.
It's not worth benchmarking this syscall on its own, since it's only used in the context of a zygote implementation. Instead we should measure the overall performance of the zygote implementation.
Creating VMO children using VMAR references instead of handles is not currently possible, but this RFC would make it possible. It's a bit like granting a new capability out of thin air.
In this case it doesn't seem that risky
because the end result is the same as making a new VMO and copy the data over
(but with less memory usage), which was possible before.
If anything needs to disallow this,
this RFC proposes a
zx_vm_options_t flag (
that can be used.
The syscall will be unit tested. The tests for a future implementation of address space cloning will also serve as integration tests for the syscall.
This RFC is a starting point for the documentation of
Drawbacks, alternatives, and unknowns
What other strategies might solve the same problem?
We could add a system call to do the entire address space clone operation. This would be a very complicated system call, in both interface and implementation. It's better to push the complexity into userspace.
We could add a system call that allows minting a VMO handle from a mapping.
This allows what is essentially a userspace implementation of
by first creating a handle from the mapping and then calling
But the Zircon API generally only allows minting handles to newly created objects,
in order to make the interface easier to reason about.
This syscall would have to create handles out of thin air.
There are various tricks that might be used to avoid closing any VMO handles,
such as modifying all code that creates and maps VMOs to avoid closing them,
However, this is not sufficient to implement address space cloning,
since the loader service maps some VMOs into the process before it starts running,
and this doesn't help you get handles to those.
We could modify the loader service to pass the necessary handles to the process,
but that would be more work than adding this syscall.
Prior art and references