The minimum block size is 16 bytes (
MIN_BLOCK_SIZE) and the maximum block
size is 2048 bytes (
There are 8 possible block orders (
NUM_ORDERS), numbered 0...7, corresponding
to blocks of sizes 16, 32, 64, 128, 256, 512, 1024, and 2048 respectively.
All blocks are aligned on 16-byte boundaries, and addressing within the VMO
is in terms of 16-byte blocks (
offset = index * 16).
block_header consists of 16 bytes as follows:
There are currently 11 types defined in //zircon/system/ulib/inspect/include/lib/inspect/cpp/vmo/block.h:
Each type interprets the payload differently, as follows.
FREE blocks contain the index of the next free block of the same order.
Free blocks create singly linked lists of free blocks of each size for
fast allocation. The end of the list is reached when NextFreeBlock points
to a location that is either not
FREE or not of the same order.
RESERVED blocks are simply available to be changed to a different type.
It is an optional transitional state between the allocation of a block and
setting its type that is useful for correctness checking of
implementations (to ensure that blocks that are about to be used are not
treated as free).
There is one
HEADER block at the beginning of the VMO region.
It consists of a Magic Number ("INSP"), a Version (currently 0),
and the Generation Count for concurrency control.
This will always appear at index 0 in the VMO, so the index 0 will always be an invalid index regardless of what other type is expected; 0 may then serve as an invalid index for use as a sentinel value.
Values all start with the same prefix, consisting of the index of the parent for the value and the index of the name used to refer to the value.
The payload is interpreted differently depending on the type of value, as below:
OBJECT_VALUE and TOMBSTONE:
OBJECT_VALUE blocks contain a reference count that is incremented when it
is referenced as a parent and decremented when the referencing value is
Objects are anchor points for further nesting.
ParentID field of values may only refer to blocks of type
A deleted object becomes a new, special type called
TOMBSTONE, which is
only deleted when its refcount becomes 0.
VALUE blocks all contain the 64-bit numeric type inlined.
PROPERTY_VALUE blocks contain the index of the first
EXTENT block holding
the string data, and they contain the total length of the string across
EXTENT blocks contain a string payload and the index of the next
in the chain.
Strings are retrieved by reading each
EXTENT in order until Total Length
bytes are read.
NAME blocks give objects and values a human-readable identifier. They
consist of a UTF-8 payload that fits entirely within the given block.
InspectState consists of the following data:
||VMO mapped into the address space of this program.|
||Size of the VMO, in bytes.|
||Mapped vmo buffer.|
||Pointer to the first free block of the order given by the index into the array. Indices are multiplied by 16 to give the byte offset. A pointer pointing to either an invalid region or a block that is not FREE and of the right order is treated as representing the end of the list.|
||Pointer to the generation count for updates.|
MIN_SIZE = 1 << 4 // 16 bytes MAX_SIZE = 1 << 11 // 2048 bytes NUM_ORDERS = 8 BLOCK_HEADER_SIZE = 16 EXTENT_VALUE_OFFSET = 4 // Value starts at byte 4 NAME_VALUE_OFFSET = 1 // Value starts at byte 1
if (generation_counter) atomically_increment(generation_counter, acquire_ordering);
Locks the region against concurrent reads by setting the generation to an odd number. During initialization this does nothing.
if (generation_counter) atomically_increment(generation_counter, release_ordering);
Unlocks the region allowing concurrent reads by setting the generation to a new even number. During initialization this does nothing.
WithLock designates an algorithm that must occur between a single
It is up to the implementer to ensure that nested calls to
algorithms only lock and unlock once.
Block is an offset into the VMO.
Dereferencing this offset is assumed to give access to a
getters and setters for the various fields specified above.
block.size = MIN_SIZE << block.order
This denotes pseudocode describing what changes to make to a block, it is assumed the block header is set to 0 before the described changes.
Buddy(Block block): // This common buddy allocation algorithm calculates the buddy of a block of a // given size. return ((block * MIN_SIZE) XOR block.size)
IsFreeBlock(Block block, int expected_order): // Ensure the block index is within the bounds of the VMO, // that the block is free, and (optionally) that the order matches what is // expected return (block >= 0 && block < vmo_size/MIN_SIZE && block.type == FREE && block.order == expected_order)
RemoveFreeBlock(Block block): // If the block is at the head of the list, short circuit. if free_blocks[block.order] == block: free_blocks[block.order] = block.next_free return // Iterate through the list and unlink the block when it is found. current_block := free_blocks[block.order]; while (IsFreeBlock(current_block)): if current_block.next_free == block: current_block.next_free = block.next_free return
@WithLock SplitBlock(Block block): // Remove the block from its current free list. RemoveFree(block); // Reduce the order of the block, and find its new buddy. block.order -= 1 buddy := Buddy(block) // Set both blocks to free with the new order. SetBlock(block, type=FREE, order=block.order) SetBlock(buddy, type=FREE, order=block.order) // Prepend [block, buddy] onto the free list. buddy.next_free = free_blocks[buddy.order] block.next_free = buddy free_blocks[block.order] = block
@WithLock GrowVmo(size_t size): old_size := vmo_size vmo_data, vmo_size = Grow vmo to size, remap, get new size next_free_block := free_blocks[NUM_ORDERS-1] // Store all of the new blocks in the free list in ascending order. for block = vmo_size - MAX_SIZE; block >= old_size; block -= MAX_SIZE: block.SetBlock(type=FREE, order=NUM_ORDERS-1, next_free=next_free_block) next_free_block = block free_blocks[NUM_ORDERS-1] = next_free_block
@WithLock Allocate(size_t size): order := Minimum order that can contain size // No free blocks to accommodate the size, grow the VMO to create more. if not any(IsFreeBlock(free_blocks[i], i) for i in range(order, NUM_ORDERS): GrowVmo(vmo_size*2) // Find the first block with order >= the desired order block := first(free_blocks[i] for i in range(order, NUM_ORDERS) where IsFreeBlock(free_blocks[i], i)) // If the block is too big, split until it is the right size. while block.order > order: SplitBlock(block) // Remove that block from the free list and allocate it. RemoveFree(block) block.SetBlock(type=ALLOCATED, order=block.order) return block
Free (and merge if possible)
@WithLock Free(Block block): buddy := Buddy(block) // As long as the buddy of this block is free, keep merging blocks. while buddy.type == FREE && block.order < NUM_ORDERS - 1: // Ensure we only merge with the left most block. if buddy < block: tmp := buddy buddy = block block = tmp RemoveFree(buddy) block.order += 1 buddy = Buddy(block) // Free the block and put it at the head of the free list. block.SetBlock(type=FREE, order=block.order, next_free = free_blocks[block.order]) free_blocks[block.order] = block
// Initial size is a valid VMO size that is a multiple of MAX_SIZE. Initialize(size_t initial_size): for i in range(NUM_ORDERS): free_blocks[i] = 0 // Invalid for all but the maximum size block. GrowVmo(initial_size); block := Allocate(MIN_SIZE); // The first order 0 block in the region is special; it contains a region header // consisting of a magic number in the first 8 bytes and the // generation count in the second 8 bytes. // We may now use 0 as the sentinel value for all free lists. block.SetBlock(type=HEADER, magic="INSP", version=0, generation=0); generation_count = &block.generation;
// Iteratively free all extents including and following the given one. @WithLock FreeExtents(Block extent): assert(extent.type == EXTENT); cur_extent := extent while cur_extent != 0: next_extent = cur_extent.next_extent Free(cur_extent) cur_extent = next_extent
Lock-free Update Strategy
We propose a mechanism for multiple readers and single writer. We can use a global version counter so that readers can detect in-flight modifications and modifications between reads.
- spinlock until the version number is even (no concurrent write),
- copy the entire VMO buffer, and
- check that the version number from step 1 is the same as in the copy. As long as the version numbers match, the client may read their local copy to construct the shared state. If the version numbers do not match, the client may retry the whole process.
This is a simple strategy with a significant benefit: between incrementing the version number for beginning and ending a write the writer can perform any number of operations on the buffer without regard for atomicity of data updates.
The main drawback is that reads could be delayed indefinitely due to a frequently updating writer, but readers can have mitigations in place in practice.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.