|RFC-0079: Detecting debuglog data loss|
A revised mechanism for detecting debuglog data loss.
|Date submitted (year-month-day)||2021-02-17|
|Date reviewed (year-month-day)||2021-03-25|
This document proposes an update to the mechanism for detecting dropped messages from the kernel's debuglog object.
Background and motivation
The kernel debuglog subsystem is a simple logging facility that enables user mode programs to read and write log messages. Logically, this system provides a single FIFO log buffer that can be written to or read from by multiple writers or readers.
The debuglog can be lossy. Assuming the readers can keep up with the write rate, all readers will see all messages written by all writers in the order in which they were written. However, if a reader is slow and cannot keep up, it will miss messages.
Messages can be written to the debuglog by user mode programs via
zx_debuglog_write or by the kernel via
printf. Messages can be read via
zx_debuglog_read. Additionally, the kernel has a specialized thread, known as
the debuglog_dumper, for reading messages from the debuglog and writing them to
the debug serial port.
The debuglog buffer has a fixed capacity. When that capacity is reached, the least recently written message(s) may be dropped to make room for a new message. Any readers that haven't "caught up" will never see the dropped message(s).
Knowing that a log is complete allows one to reason about the absence of certain events so detecting dropped log messages is an important feature of a log system. Debuglog readers need to be able to detect when log messages have been dropped.
Currently, the debuglog provides a mechanism for readers to detect when and how
many bytes of log data have been dropped, the
uint32_t rolled_out field of
After reading a
field will contain the number of bytes of log messages that have been dropped
from the debuglog since that reader's last successful read. The value includes
both the bytes of the dropped log headers and the bytes of the dropped log
rolled_out mechanism is implemented by having each reader maintain a
pointer into the debuglog buffer that points to the next message that it has not
yet read. The debuglog maintains a write pointer that points to the location
where the next message will be written. If a reader notices that the write
pointer has passed its read pointer, then it knows it has missed one or more
messages. By subtracting the pointer values, a reader can determine how many
bytes of log data (including headers) it has missed.
rolled_out mechanism is currently unused.
This RFC proposes...
Replacing byte oriented data loss detection with record oriented detection.
To more closely match the expectations of debuglog readers, in particular the debuglog_dumper, the existing byte oriented
rolled_outmechanism will be replaced with a per-record sequence number that can be used to detect data loss (gaps in the sequence).
The debuglog_dumper must write each message it reads to the debug serial port. Because the serial port may not be particularly fast, the debuglog_dumper often cannot keep up with the debuglog, which results in dropped messages. When this happens, we'd like to print a message to the serial port indicating that data loss has occurred and how many messages were lost, similar to Linux's
Eliminating the possibility of undetected data loss using a 64-bit value.
rolled_outfield is 32-bits in size and counts bytes rather than records, if 4GB of log data is written in between two calls to
zx_debuglog_readit is possible to overflow the value, which could result in undetected data loss. This is unlikely to occur in practice, but it would be nice to entirely eliminate the possibility. If we were to replace the 32-bit byte sequence field with a 32-bit per-record sequence field the amount of data required to create an overflow grows to approximately 128GB. By using a 64-bit sequence field, we can entirely ignore the possibility of overflow even at very high logging rates.
Enabling future implementation optimizations.
There are some potential future optimizations that would depend on allowing multiple debuglog readers to "share" a single
Absent slow readers, all debuglog readers should see the exact same log data in the same order. With the exception of
rolled_out, all fields of
zx_log_record_tare fixed at the time the record is written to the debug log.
rolled_outis different in that it's computed per-reader, at the time the record is read out of the debuglog. If we had another way to detect data loss without using fields that could vary per-reader, we could enable some potential future optimizations where a single record is shared among all readers.
The debuglog assigns a 64-bit sequence number, starting with 0, to each record when it's written to the debuglog.
Each record's sequence number will be exactly one greater than the preceding
rolled_out field and replace it with the
record's sequence number,
zx_debuglog_read can then detect gaps in the sequence and calculate
how many messages were dropped.
zx_log_record_t are not used out of tree. While the
full blown Fuchsia Large Scale Changes (LSC) process is not required, an FYI LSC
bug will be filed and the implementation will be completed in phases.
rolled_out field is unused, but the containing struct,
is used in a few places within fuchsia.git. Care needs to be taken to not break
zx_log_record_t is not used out of tree.
The syscall definition and documentation of
zx_debuglog_read don't actually
specify that it returns a
zx_log_record_t. Instead they specifies a
size_t and callers must know to cast or "overlay" a
top of the result. Casting to
zx_log_record_t* is error prone so the
syscall parameter will be changed to
zx_log_record_t* and callers will be
There is currently no Rust equivalent of
zx_log_record_t and Rust callers use
hard coded offsets to access the fields so changing the size or offsets of its
fields can silently break these callers. As part of the implementation we'll
create a Rust equivalent of
zx_log_record_t and update Rust callers to use it.
The steps are as follows:
Add a 64-bit sequence number to the private, internal representation of debuglog messages (
Change debuglog consumers to use
zx_log_record_t(or language equivalent) rather than hard coded field offsets. In particular, create a Rust equivalent type and update Rust code to use it.
rolled_outfield to a zero-filled
uint64_t sequenceand ensure no implicit struct padding is created. Do the same for all
zx_log_record_tequivalent types, regardless of language.
zx_debuglog_readdocumentation to explain how callers may use the new sequence field to detect data loss.
Steps 1, 2, and 3 will each get their own CL. Steps 4, 5, and 6 will occur in a single CL.
Runtime cost of managing the sequence counter
Debuglog operations are performed while holding a lock so we can use a regular
uint64_t value to generate the sequence. No measurable performance impact is
Size impact of per record sequence value
dlog_header_t, the kernel's private record
implementation, will grow in size.
will be replaced by a 64-bit record sequence field, yielding a net gain of 4
dlog_header_t size change is more interesting because it's the native form
in which log records are stored in the FIFO.
dlog_header_t is 32 bytes in size
and has no
rolled_out field so its net gain will be the full 8 bytes. FIFO
space is limited in the debuglog object so increasing each log record by 8 bytes
will reduce the maximum number of records that may be stored in the FIFO and
also reduce the maximum message size (from 224 bytes to 216).
The FIFO can store 128KB of headers and messages. A sampling of messages indicates the average size is about 100 bytes. Assuming this average size, with a 32 byte header, the FIFO can store approximately 971 messages. With a per record sequence number, the number is reduced to approximately 917.
The proposal does not alter the security of the system. Debuglog readers are privileged components. Absent data loss, a debuglog reader could already synthesize the log record sequence with perfect accuracy.
No privacy impact.
In-kernel unit tests will verify the underlying debuglog implementation and debuglog core tests will verify the behavior observable at the syscall layer.
zx_log_record_t's documentation will be updated.
Drawbacks, alternatives, and unknowns
rolled_out is not yet used, a relatively small amount of engineering
effort is required to implement the proposal at this time. Once downstream code
makes use of
rolled_out it will be more expensive to implement this or a
The issue of 32-bit wraparound could be somewhat mitigated by documenting the
UINT32_MAX to mean "
UINT32_MAX or more". Or we could change
rolled_out's type to
Record sequence and byte sequence
If space were free, we could put both a record sequence value and a byte
sequence value in each
zx_log_record_t. Debuglog readers could then measure
the data loss in either number of records or number of bytes lost. However, that
would further increase the size of the
dlog_header_t and it's not clear we
have a use case for bytes dropped.
Prior art and references
printk reports the number of messages dropped/suppressed rather than
the bytes of log data dropped/suppressed.