Ring buffer behavior

Ring buffers are used to convey audio between parties (usually in different processes), allowing concurrent, asynchronous data access without requiring locks. This pattern works because the parties share an understanding of which buffer areas are safe to access, and how those areas change over time.

The descriptions below explain how audio data frames are moved from one party to another. As two examples, we detail (1) the movement of audio from a client to a driver (playback to that driver's hardware), as well as (2) the movement of audio from a driver to a client (recording from that driver's hardware). However, ring buffers can be used to convey audio between two app-level clients or between two drivers, as well.

Some audio hardware can transfer data to/from system memory without involvement from software (e.g. the host-based driver); other audio hardware accomplishes this with software running on the hardware itself (e.g. DSP firmware). Nonetheless, henceforth for consistency we will refer to this movement of audio data as being done by the 'driver'. For details about this distinction, see the 'Hardware versus Software' section toward the end of this document.

Ring buffer users avoid the need for mutexes or other active synchronization because they share three important pieces of information. The first is the memory bounds of the ring buffer itself. The second is the rate at which the audio must be produced and consumed; this rate is defined by the format specified in the CreateRingBuffer command. Third, the two parties must share an understanding of the ring buffer's start time.

While the ring buffer is not started, time has no effect on the ring buffer's state. While the ring buffer is started, there is a ring buffer position that continually moves across the ring buffer at a constant speed set by the predefined format. By definition, at 'start time', this ring buffer position 'R' begins at the beginning of the ring buffer, at frame 0.

While the ring buffer is started, the driver is constrained in the size of data transfers that it can make in a single I/O operation. For playback (where the driver is Consumer, and the client is Producer), audio frames are consumed by the driver in transfers that can be as large as driver_transfer_bytes. For capture (where the driver is Producer, and the client is Consumer), audio frames are produced by the driver in transfers that can be as large as driver_transfer_bytes.

These driver data transfers mean that there is always a section of the ring buffer that is unsafe for the client to be writing (or reading, if the ring buffer is being used for capture). This unsafe buffer region is defined on one side by the current ring buffer position 'R', and on the other side by a 'safe pointer' location. Depending on whether the ring buffer is being used for playback or capture, this is either "the safe frame location for a Producer to write" ('P') or "the safe frame location for a Consumer to read" ('C'), respectively. The diagrams below label these pointers as 'R', 'P', 'C'.

For playback, the region between 'R' and 'P' must not be written by the client at that time. For capture, the region between 'C' and 'R' must not be read during that time.

Once a ring buffer starts, these pointers begin moving at a fixed rate. 'R' begins moving at the start_time returned from RingBuffer::Start, from lower addresses to higher addresses, and instantaneously restarting at the beginning of the ring buffer when reaching its end. This movement of 'R', 'P' and 'C' enables a Consumer to safely read ring buffer contents that were previously written by a Producer, after an appropriate time duration has passed.

To pass audio through the ring buffer, the Producer must write data BEFORE the Consumer transfers occur. Conversely the Consumer must read data AFTER the Producer transfers occur. For this reason, 'P' (which we define during playback) is always ahead of 'R', whereas 'C' (which we define during capture) is always behind 'R'. Restated, 'P' refers to a frame that is earlier than the frame referred to by 'R', and 'C' refers to a frame that is later than the frame referred to by 'R'. A given frame would be referred to by 'P' before it is referred to by 'R', before it is referred to by 'C'. In the diagrams below, ring buffer frame 0 is at the left, and ring buffer positions move from left to right. Therefore when looking from left to right, we expect the diagrams to show 'C' then 'R' then 'P' (modulo the effects of wraparound).

Playback

Before starting the ring buffer, playback clients may safely write any ring buffer location. For this reason, 'P' is not yet defined.

                                 Ring Buffer
+-----------------------------------------------------------------------------+
[<--                             safe to write                             -->)
[             (to pre-populate the ring buffer before starting it)            )
+-----------------------------------------------------------------------------+
0=R                                                                           0

For audio to be played as soon as possible, the client should write their first frames of audio where they will be the first thing the driver reads when the ring buffer starts: the beginning of the ring buffer. If a client has sufficient audio available (perhaps the entire audio file), it may choose to pre-populate the whole ring buffer before starting it. Other clients receive audio as a real-time stream; those clients can still pre-populate audio to the beginning of the ring buffer but must write more than driver_transfer_bytes of audio, since upon Start the driver may immediately consume that much data from the ring buffer. This means that the client must continue writing audio from that frame (labelled 's' below) before the driver reads it (before 'P' reaches it).

                               Ring Buffer
+-------------------------+---------+-----------------------------------------+
[<-- Pre-populated by the client -->)      not yet written by the client      )
[< driver_transfer_bytes >)                                                   )
+-------------------------+---------+-----------------------------------------+
0=R                                 s                                         0

If the client cannot pre-populate enough audio, then they should start their audio at an offset, rather than the beginning of the ring buffer. This relies on the zeroed-out contents of the VMO to be the first audio read by the driver. As above, this offset (again called 's') must be sufficient for the client to provide subsequent audio frames before the driver consumes them. For example:

                               Ring Buffer
+-------------------------+---------+-----------------------------------------+
[    Offset   [<-- Pre-populated -->)      not yet written by the client      )
[< driver_transfer_bytes >)                                                   )
+-------------------------+---------+-----------------------------------------+
0=R                                 s                                         0

Once the ring buffer is started, it is not safe for the client to write data to the ring buffer between 'R' and 'P', because this represents data already in use (potentially already consumed by the driver). The client may safely write the rest of the ring buffer (between 'P' and '0/R').

As always, the client should never write too close to 'P', as it is an instantaneous hypothetical pointer which could advance during the delay of even a single CPU instruction. The effective 'safe to write' region for a client is always changing, as 'P' is constantly moving. For this reason, a client should write ahead (at a higher memory address) such that it always has enough time to write more data ahead of 'P'.

This is the state of the playback ring buffer, at the moment it is started:

                               Ring Buffer
+-------------------------+---------------------------------------------------+
[<--  unsafe to write  -->[<--           safe to write (not yet            -->)
[< driver_transfer_bytes >[              consumed by the driver)              )
+-------------------------+---------------------------------------------------+
0=R                       P                                                   0

As time passes, the driver reads data in chunks of driver_transfer_bytes or less, at the rate specified in CreateRingBuffer. Many drivers use a 'ping pong' pattern where they read half of their allocated ring buffer region at a time, to allow time for these reads to occur safely. Regardless of the size of the driver transfers, the Position and Safe pointers ('R' and 'P') move to the right at the same rate, but do so smoothly. As a result, the "unsafe for client writes" area moves gradually through the ring buffer, while maintaining a constant size equal to driver_transfer_bytes. Thus, after some period we now have:

                               Ring Buffer
+------------+-------------------------+--------------------------------------+
[<-- safe -->[<--  unsafe to write  -->[<--     safe to write (not yet     -->)
[  to write  [< driver_transfer_bytes >[        consumed by the driver)       )
+------------+-------------------------+--------------------------------------+
0            R                         P                                      0

Later, 'P' wraps around the ring buffer before 'R' does. Note that the region from 0 to 'P', plus the region from 'R' to the end of the ring buffer, adds up to driver_transfer_bytes:

                               Ring Buffer
+---------------+--------------------------------------------------+----------+
[<--  unsafe -->[<--        safe to write (to overwrite         -->[<-unsafe->)
[ransfer_bytes >[              already-consumed data               [< driver_t)
+---------------+--------------------------------------------------+----------+
0               P                                                  R          0

In steady state, i.e. once the process has wrapped around the ring buffer, any frame at or greater than 'P' (up to a limit of 'R + ring_buffer_size') is safe for the client to write. Restated, and factoring in ring buffer wraparound, the Producer can safely write either the ranges [0, R) + [P, ring_buffer_size), or alternately range [P, R), depending on where 'R' lies relative to the ring buffer wraparound point -- either the above diagram, or (more frequently) this one:

                               Ring Buffer
+--------------------------+-------------------------+------------------------+
[<--   safe to write    -->[<--  unsafe to write  -->[<--   safe to write  -->)
[                          [< driver_transfer_bytes >[                        )
+--------------------------+-------------------------+------------------------+
0                          R                         P                        0

Note the boundary requirements: the "unsafe for Producer to write" region is [R, P), so a Producer cannot safely write location 'R' (which is equivalent to 'R + ring_buffer_size', the producer high-water location). Similarly, the "safe for Producer to write" region is [P, R) (with wraparound), so a Consumer cannot safely read location 'P'.

But in practice, that precise frame is not safe for either party to access. Frame pointer locations 'P' and 'R' are theoretical and instantaneous. By the time the driver reads from 'R', that pointer will have slightly moved, rendering that location unsafe for reads; by the time the client writes to 'P', that pointer will have slightly moved, rendering that location unsafe for writes. The Producer and Consumer must always maintain a level of safety padding ahead of their "safe" pointer locations.

The driver_transfer_bytes value specified by a driver is critical for ensuring that clients do not write into memory that the driver is still actively reading. With the 'ping pong' pattern mentioned above, a driver would specify a value for driver_transfer_bytes that is twice the size of the actual transfers themselves. Indeed it would reflect the size of the internal double-buffer that provides the extra duration of safety padding.

Recording

While recording, it is only safe for the client to read the part of the ring buffer that is not simultaneously being written by the driver. Before capture begins, the driver has not yet written anything for the client to read.

At the instant that capture starts (reported by RingBuffer::Start), the driver cannot immediately transfer frames to the ring buffer, because these frames have not yet been acquired. The driver must first accumulate enough frames to make a transfer, and only thereafter would move that amount to the ring buffer starting at frame '0'. Many drivers use a double-buffer (or 'ping pong') pattern where they transfer half of their buffering amount in each transfer. Since at the moment of 'Start' no audio frames are yet available for the client to read, 'C' is effectively undefined. However it will be helpful to think of a position 'b' (which will become 'C'). This 'b' lags frame 'R' by a fixed offset and has not yet reached frame location 0. Here is the ring buffer state when it is started:

                               Ring Buffer
+---------------------------------------------------+-------------------------+
[<--         safe to read (but empty, not yet written by driver)           -->)
[                                                   [< driver_transfer_bytes >)
+---------------------------------------------------+-------------------------+
0=R                                                 b                         0

After the ring buffer is started but before 'R' has advanced by driver_transfer_bytes, the client cannot yet safely read ANY newly captured frames, because they may not have yet been transferred into the ring buffer. Although 'R' is advancing, the driver may or may not have made any transfers into the buffer yet. With the 'ping pong' pattern, the driver waits until the first half of its internal buffer is full before transferring its contents into the ring buffer -- and while this transfer occurs, the other half of the internal buffer remains available to safely receive subsequent frames.

The amount of audio that has actually been captured into the ring buffer will change with each driver transfer, so it moves across the ring buffer in a "chunky" way. By contrast, 'R' and 'C' will by definition move in a perfectly smooth manner; they are guaranteed to always bound where the actual most-recently-captured frame lies.

At this time, because 'R' has not yet advanced by driver_transfer_bytes, 'C' is still effectively undefined. Our marker 'b' continues to advance, lagging 'R' by a fixed offset and still not yet reaching 0:

                               Ring Buffer
+--------------+--------------------------------------------------+-----------+
[<-- unsafe -->[<--           empty, not yet written by driver             -->)
[ansfer_bytes >)                                                  [< driver_tr)
+--------------+--------------------------------------------------+-----------+
0              R                                                  b           0

Once the ring buffer position 'R' has advanced by exactly driver_transfer_bytes, the driver is guaranteed to have made at least the initial transfer of audio frames into the ring buffer. With the 'ping pong' pattern, the driver will have already transferred its first-half ('ping') buffer into the ring buffer some time ago, and its second-half ('pong') buffer will have just been filled and can now be written to the ring buffer. Location 'b' has reached the beginning of the ring buffer, so 'C' is now defined and begins to smoothly advance at the same rate as 'R' (as determined by the ring buffer's frame rate and sample format). So at this instant we have:

                               Ring Buffer
+-------------------------+---------------------------------------------------+
[<--       unsafe      -->[<--       empty, not yet written by driver      -->)
[< driver_transfer_bytes >[                                                   )
+-------------------------+---------------------------------------------------+
0=C                       R                                                 b=0

As the ring buffer position 'R' advances further, the client can safely read frames in the region between '0' and 'C'. It is unsafe for the client to read data from 'C' up to 'R', because this is where the driver is simultaneously writing. This region progresses across the ring buffer, maintaining a constant size of driver_transfer_bytes. Conceptually the ring buffer is now in this state:

                               Ring Buffer
+--------------------+-------------------------+------------------------------+
[<   safe to read   >[<--  unsafe to read   -->[<--     empty, not yet     -->)
[newly-captured audio[< driver_transfer_bytes >[       written by driver      )
+--------------------+-------------------------+------------------------------+
0                    C                         R                              0

Later, 'R' wraps around the ring buffer before 'C' does. Note that the region from 0 to 'R', plus the region from 'C' to the end of the ring buffer, adds up to driver_transfer_bytes.

As always, the client should never read too close to 'R', as it is an instantaneous hypothetical pointer which could advance during the delay of even a single CPU instruction. The effective 'safe to read' region for a client is always changing, as 'R' is constantly moving. For this reason, a client should read ahead (at a higher memory address) such that it always has enough time to read more data ahead of 'R'.

This is the state of the ring buffer, at some time after its first wraparound:

                               Ring Buffer
+-----------+--------------------------------------------------+--------------+
[<--unsafe->[<--                safe to read                -->[<-- unsafe -->)
[fer_bytes >[                 (captured audio)                 [< driver_trans)
+-----------+--------------------------------------------------+--------------+
0           R                                                  C              0

In steady state, i.e. once the process has wrapped around the ring buffer, any frame less 'C' (up to the limit of 'R - ring_buffer_size') is safe for the client to read. Restated, and factoring in ring wraparound, the Consumer can safely read either ranges [0, C) + [R, ring_buffer_size), or alternately range [R, C) -- depending on where 'R' lies relative to the ring wraparound point -- either the diagram above, or (more frequently) this one:

                               Ring Buffer
+--------------------------+-------------------------+------------------------+
[<--    safe to read    -->[<--      unsafe       -->[<--   safe to read   -->)
[                          [< driver_transfer_bytes >[                        )
+--------------------------+-------------------------+------------------------+
0                          C                         R                        0

Note the boundary requirements: the "unsafe for Consumer to read" region is [C, R), so a Consumer cannot safely read location 'C' (the Consumer low-water frame location). Similarly, the "safe for Consumer to read" region is [R, C) (with wraparound), so a Producer cannot safely write location 'R'.

But in practice, that precise frame is not safe for either party to access. Frame pointer locations 'R' and 'C' are theoretical and instantaneous. By the time the driver writes to 'C', that pointer will have slightly moved, rendering that location unsafe for writes; by the time the client reads from 'R', that pointer will have slightly moved, rendering that location unsafe for reads. The Producer and Consumer must always maintain a level of safety padding ahead of their "safe" pointer locations.

The driver_transfer_bytes value specified by a driver is critical for ensuring that clients do not read into memory that the driver is still actively updating. With the 'ping pong' pattern mentioned above, a driver would specify a value for driver_transfer_bytes that is twice the size of the actual transfers themselves. Indeed it would reflect the size of its internal double-buffer that provides the extra duration of safety padding.

Hardware versus Software (or hardware transfers, versus driver process-and-copy)

Ring buffer data frames can be directly consumed/generated by audio hardware: i.e. driver_transfer_bytes might map directly to the size of a hardware FIFO block, since that FIFO block would determine the upper limit amount of data read ahead or held back. Note that if the FIFO buffer is used in the traditional "high water" way (such as 'ping pong' design where only half of the FIFO is used at any time -- after first filling the entire FIFO at Start time), then driver_transfer_bytes would be set to the size of the internal FIFO buffer, which would be double the size of the internal transfers if using the 'ping pong' pattern. Even if smaller transfers are used, if the full size of the FIFO is used (for instance, upon Start when filling an initially empty hardware FIFO), then driver_transfer_bytes must be set to the entire size of this FIFO buffer.

Ring buffer data may instead be consumed/generated by audio driver software that is conceptually situated between the ring buffer and the audio hardware. In this case, for playback as an example, the driver_transfer_bytes read ahead amount must be large enough such that the driver guarantees no undetected underruns, based on the client requirement to generate data at the rate specified by CreateRingBuffer and at locations derived from start_time of Start. Conversely, for capture driver_transfer_bytes must be large enough for the driver to guarantee no underruns when generating data as determined by CreateRingBuffer and Start. Also, it is expected that the driver_transfer_bytes in these cases would be larger than merely the size of the transfer itself, since it must also include any safety padding to account for delays from scheduling and executing this driver processing.