Introduction to fbl intrusive containers

Intrusive containers (or intrusive data structures) are the type of containers offered by the Fuchsia Base Library (fbl::).

This document will:

Define what an intrusive container is.
Compare the differences between intrusive and non-intrusive containers.
Discuss the similarities and differences between fbl:: and std:: containers.
Introduce some basic terminology that will be used throughout this guide.

What is an intrusive container?

An intrusive container is a data structure that can be used to hold a collection of objects where the bookkeeping used to track membership in the container is stored in the objects themselves instead of holding the object alongside of the bookkeeping in a structure. To highlight the distinction, consider a structure that defines a point in a 2-dimensional coordinate system.

struct Point {
  float x, y;
};

In a traditional non-intrusive implementation of a doubly linked list, a node in the list might look something like this

struct ListNode {
  Point val;
  ListNode *next, *prev;
};

While an intrusive version of the same list would be

struct Point {
  float x, y;
  Point *next, *prev;
};

Initially, these appear to be very similar approaches. The distinction seems mostly like a question of which object is contained within the other. As time goes on, however, the differences become more apparent.

Adding a point to a non-intrusive list of points means:

Allocating a ListNode
Copying the Point you want to add to the list into the val member of the ListNode
Linking the next/prev pointers of the ListNode into the list.

Removing the point is the same process in reverse.

The pointers are unlinked.
The contents of the Point are copied out to local storage.
Finally, the ListNode is de-allocated.

When using an intrusive list, the add/remove operations skip the allocation/deallocation and copying of the Point's members. The next/prev pointers, which exist in the point object already, are simply linked into the list during the add operation and unlinked during the remove. While the intrusive form does not have to perform any allocations or copy any structures to add elements to a list, every Point in the system is must carry around the overhead of the next/prev bookkeeping, even while not in a list.

While a small structure or primitive data type might be held by value in a non-intrusive list, when objects become larger, it becomes more common to track the object using some appropriate pointer type instead of storing the object by value. For example

struct ListNode {
  std::shared_ptr<LargeObject> ptr;
  ListNode *next, *prev;
};

The "value" being stored in this list is an instance of a shared pointer, so while adding or removing a pointer to a LargeObject to a list still requires management of the allocation used to store the node state, the object itself no longer needs to be copied when being added to a list. Additionally, this approach allows a single instance of a LargeObject to exist in multiple containers at the same time as the nodes simply track references to the objects and not copies of the objects. A simple implementation of an intrusive form of this would put the next/prev pointers into LargeObject directly implying that an instance of a LargeObject could be a member of exactly 0 or 1 intrusive lists, but not more.

Intrusive vs. non-intrusive containers

The primary advantage of an intrusive container approach to tracking objects is the lack of allocations when adding/removing elements to/from a container. In some (usually specialized) environments, heap interactions can introduce a potential failure path, which would ideally not exist. Additionally, interactions with heaps usually involve interactions with locks/mutexes, which could result in involuntary blocking. This might introduce undesirable timing indeterminism, or in some cases may even not be an option if the code is running in a context in which blocking is not allowed.

For example, what happens if code needs to add a bookkeeping structure to a list during a hard interrupt handler in the kernel if the allocation for the bookkeeping structure cannot be allocated, or if allocation of the bookkeeping would require the IRQ handler to block due to lock contention (something that cannot happen in the exception handler)? By allocating the container bookkeeping as part of the object itself ahead of time, an intrusive container approach to this problem allows these issues to be avoided.

In general, intrusive containers can provide performance advantages when the design of the system allows the implementer to know at compile time all of the various containers that an object will ever exist in. This does not mean that they are always the best choice, however, as they come with limitations as well.

Finally, for many types of intrusive containers, holding a reference to an object implies that you have located that object in all of the container types it can possible exist in where the same is not true when an object is tracked by reference in multiple non-intrusive container instances. For example, an object which could exist in a balanced binary search tree as well as in a doubly linked list (both intrusive) could be located in O(log n) time in the tree, and then removed from the list in O(1) time, instead of needing to find the object in the list in O(n) time before removing it.

A short list of some pros and cons of the two approaches might look something like this:

Non-intrusive:

Pros
- Can be used to track simple objects by value, even primitive data types like int or float
- Does not require changing the definition of the object itself in order to manage it in a different container. This is especially helpful when an object O is defined in Library A, but a user wishes to create a collection of A::Os in their own program and cannot reasonably re-define O.
- Objects tracked by reference can easily exist in multiple containers independently.
Cons
- Requires independent allocation management of the bookkeeping, usually on every add or remove operation. This management frequently implies hidden heap interactions, which can introduce overhead, timing uncertainty, and additional complexity as users need to manage potential failure paths.
- Finding the location of an object in container A provides no information about the location of the object in container B. For example, finding an object contained in a std::map with an O(log n) operation does not help me to locate the same object that is simultaneously contained in a std::deque. If I want to remove the object from the std::deque, I must find it there with a separate O(n) operation first.

Intrusive:

Pros
- Minimal overhead to join or leave a container, and no chance of failure or involuntary yielding due to lock contention as the bookkeeping overhead was paid up-front during object creation.
- Finding an instance of an object implicitly finds the instance of the object in all of the containers it can exist in.
- Knowledge of whether an object is currently container in container type A is a property of the object and can always be tested (from the object) in O(1) time.
Cons
- The ability to be contained is a fundamental property of the object. In order to change the number or types of the containers an object may exist in, the definition of the object itself must be changed, and all of the code that depends on that definition must be re-compiled.
- Only objects themselves can be tracked. Primitive data types do not have any place to store extra bookkeeping information and cannot be tracked in an intrusive fashion.

`fbl::` vs. `std::`

The C++ Standard Library offers a large set of implementation agnostic non-intrusive containers. For example, std::map is an ordered associative container, however there are no requirements that an implementation of the standard library use any specific data structure in order to implement std::map, only that the implementation chosen provides certain guarantees for various container operations as specified by the standard, such as worst case algorithmic complexities, or guarantees around iterator invalidation.

If you need a non-intrusive container, this is where you should be looking.

While there have been draft proposals, the C++ standard library does not currently offer any intrusive container implementations. Similar to boost::, fbl:: attempts to fill this gap, but in a much more limited/focused fashion. The algorithms backing the types of containers types offered by fbl:: are explicit and embedded in the names of the container types themselves. The APIs are inspired by and attempt to emulate those offered by std::, however there are some differences as a result of the fundamental differences between intrusive and non-intrusive containers. There are currently 3 primary container types defined by fbl:: and one composed container type. They are:

fbl::SinglyLinkedList<>
fbl::DoublyLinkedList<>
fbl::WAVLTree<>
fbl::HashTable<>

fbl::HashTable<> is considered to be a composed container type as it offers a fixed bucket count hash table that resolves collision using chaining where the bucket type may be either fbl::SinglyLinkedList<> or fbl::DoublyLinkedList<>. Please refer to Setting up build dependencies for details on which files need to be included in order to use the various container types.

Terminology

Here are some definitions of terms used throughout this guide.

Container : An object that tracks a collection of references to objects using a specific algorithm.
Node or Node Storage : The set of data that provides the bookkeeping that allows an object to exist in a specific container type.
Listable, Containable, Mix-in : An object that can be used as a base class to allow an object to be easily stored in a container of a specific type.
Raw pointer : A non-managed T*-style pointer to an object.

Introduction to fbl intrusive containers

What is an intrusive container?

Intrusive vs. non-intrusive containers

fbl:: vs. std::

Terminology

`fbl::` vs. `std::`