RFC-0245: VMO Prefetch

RFC-0245: VMO Prefetch
StatusAccepted
Areas
  • Kernel
Description

Operations to prefetch data in VMOs in preparation for read accesses.

Issues
Gerrit change
Authors
Reviewers
Date submitted (year-month-day)2024-03-19
Date reviewed (year-month-day)2024-04-09

Summary

Have a read-only analogue of ZX_VMO_OP_COMMIT that populates a VMO for imminent read accesses.

Motivation

The existing COMMIT operations, ZX_VMO_OP_COMMIT and ZX_VMAR_OP_COMMIT, are a performance tool for users. They allow the user to indicate that ranges of a VMO or VMAR are going to be used and allow the kernel to more efficiently bulk allocate all the pages, avoiding the need to take many faults in the future.

COMMIT has two downsides today stemming from its intent of simulating a write. First, it requires the WRITE permissions on the VMAR and/or VMO, which may not be had if this is for some executable data. Second, it actively performs copy-on-write and allocates pages, which is not necessary and a waste of memory if the range is only going to be read from in the future.

Today, Starnix is impacted by the first downside stemming from its use of read-only pager backed VMOs. A PREFETCH operation would resolve both of the downsides, while retaining the desired benefits of COMMIT.

Stakeholders

Facilitator:

cpu@google.com

Reviewers:

adamperry@google.com, dworsham@google.com

Consulted:

rashaeqbal@google.com

Socialization:

This feature came about via discussion between the Starnix and virtual memory teams.

Requirements

A PREFETCH operation should be added to VMOs, as ZX_VMO_OP_PREFETCH and VMARs, as ZX_VMAR_OP_PREFETCH. These operations should only require the READ permission on their respective handles and should do the work necessary such that future read operations have minimal work to do. PREFETCH may therefore:

  • Requesting any missing pages from a user pager in the range
  • Decompress any pages in the range
  • For VMARs, create hardware page tables for any pages in the range

Design and implementation

The kernel VM internals already have most of the tools to support this feature, and implementation is largely plumbing the new API flags. As such this can be implemented in one, or two if the VMO and VMAR implementations are split, small CLs, including any relevant testing and documentation etc.

Performance

This change should not impact the performance of any existing functionality, to be verified with our existing benchmarks.

Security considerations

PREFETCH is logically equivalent to the user performing manual reads across the range. Other than needing the READ permission instead of the WRITE permission, the rest of the requirements and restrictions are exactly the same as for COMMIT, and so there are no new security considerations here.

Testing

Tests will be added to the core test suite to validate that the PREFETCH operations are supported and trigger user pager requests correctly.

Testing other aspects from user space, such as PREFETCH triggering decompressions, is difficult due to these systems intentionally being largely transparent to the user, modulo performance. As performance based unit tests are inherently flaky, especially due to emulator execution, these behaviors will be tested as far as possible with kernel unit tests.

Alternatives

Change COMMIT to simulate read

Instead of introducing a PREFETCH operation, the existing COMMIT operations could have their semantics changed to instead simulate reads. Although this would be sufficient for the Starnix use case, other existing users do want the write simulation and to have pages eagerly allocated in order to avoid faults and allocations in future writes.

Keep using ALWAYS_NEED

Starnix presently works around the lack of a PREFETCH operation by using the ALWAYS_NEED operation. ALWAYS_NEED performs the same initial action as PREFETCH needs to, that of bringing in any missing pages and decompressing etc. Unlike PREFETCH, it makes an additional statement that this memory should not be reclaimed if at all possible. This contributes to system memory pressure and may increase the likelihood of OOMs or degraded user experience.

Additional zx_vmar_map options

In addition to supporting a VMAR operation, PREFETCH could be supported at mapping creation time via an additional option such as ZX_VM_MAP_RANGE_PREFETCH. This would be an optimization to avoid needing to issue the PREFETCH operation after creating the mapping.

It is presently unclear if avoiding the extra syscall is necessary and this RFC does not wish to rule out additional support in zx_vmar_map, but it needs separate motivation that is out of scope and can be done in a follow up proposal when and if needed.