RFC-0205: Vulkan Loader | |
---|---|
Status | Accepted |
Areas |
|
Description | A description of how applications load Vulkan ICDs and layers |
Issues | |
Gerrit change | |
Authors | |
Reviewers | |
Date submitted (year-month-day) | 2023-01-12 |
Date reviewed (year-month-day) | 2023-01-12 |
Summary
This RFC describes how software on Fuchsia loads Vulkan ICDs and layers to perform hardware-accelerated rendering.
The system in this document has already largely been implemented, but this document may include changes to the architecture that we intend to make in the future.
Motivation
The Vulkan API has a C-style interface which applications use for programming the GPU. Vulkan applications interface with Vulkan functions using the Vulkan loader in one of several ways as described here: Interfacing with Vulkan Functions.
In this doc, "application" is used to denote the software component that uses the Vulkan API.
The Vulkan loader is also responsible for loading Installable Client Drivers (ICDs) and Vulkan layers along with delegating access to magma or other APIs needed to execute GPU commands on behalf of an ICD.
Vulkan ICDs are vendor-specific shared libraries that are loaded into applications to enable them to render using the GPU. Applications that use Vulkan need a mechanism to identify and load the correct ICD for the hardware in the system.
Vulkan layers are shared libraries that modify or observe the behavior of the Vulkan API by augmenting the dispatch chain of Vulkan API calls. They can be used to enhance the functionality of Vulkan or to interpose the API on behalf of Vulkan debugging or profiling functionality.
Stakeholders
Facilitator:
rlb@
Reviewers:
cstout@ costan@ jhowarth@ msandy@ rosasco@ palmer@ wittrock@
Consulted:
Socialization:
The design has been reviewed by members of the Magma team. An early version of this document was shared to the Component Framework team and at Security team office hours.
Design
On Fuchsia, the Vulkan loader is split into two parts: the
libvulkan.so shared library that is loaded into the application,
and a loader service (vulkan_loader) that is responsible for
loading ICD VMOs and transferring them to libvulkan.so
. They communicate using
the fuchsia.vulkan.loader.Loader protocol.
libvulkan.so
Khronos is the standards body for the Vulkan API. They provide a loader shared library implementation that's used on Linux, Windows, macOS and most other platforms. Google wrote a separate loader for use on Android.
The Fuchsia loader is based on Khronos's implementation; the code lives at third_party/Vulkan-Loader in the Fuchsia repo, but eventually it will all be upstreamed. When an application calls vkCreateInstance or other enumeration functions, the loader reads environment variables and JSON configuration files to determine the set of ICDs and layers to use. Layers are loaded from the component's namespace, so they're generally stored inside the package. They may also be loaded from directory capabilities provided to the component, provided the loader configuration is set to use those directories.
ICD loading
On startup, libvulkan.so
connects to the
fuchsia.vulkan.loader.Loader protocol. This channel must
remain connected for the lifetime of the application. If it exits, all future
loader calls may fail.
This long-lived connection prevents the component framework from reloading or updating the loader while a client that uses Vulkan is running. This is desirable because it prevents unexpected changes to the versions of Vulkan ICDs and loader interfaces while the application is using the loader API calls. Some Vulkan entry-points for enumerating extensions or other instance properties don't take any type of context argument; as such, the implementation will have some implicit global state.
ICDs are loaded using the fuchsia.vulkan.loader.Loader
protocol. The loader
uses the fuchsia.vulkan.loader/Loader.ConnectToManifestFs
method to access a
filesystem with manifest JSON files describing all the relevant
ICDs; this filesystem looks the same as the /usr/local/share/vulkan/icd.d
filesystem on Linux; see Filesystem serving for the
details of that filesystem.
The loader will then use the fuchsia.vulkan.loader/Loader.Get
method to get
retrieve a VMO corresponding to the ICD, which it can dlopen_vmo
to load into
the process and get the ICD entrypoints from. The set of Vulkan entrypoints on
Fuchsia is the same as that on Linux, except with Fuchsia-specific extensions as
described below.
Client components may also be packaged with software ICD implementations like
SwiftShader. In the case of SwiftShader, the VK_ICD_FILENAMES
environment variable can be used to specify the path to the manifest.json
of
the ICD. The ICD shared library will be loaded from /pkg/lib
of the Vulkan
client component.
Since most ICDs are not stored in the package and are versioned separately from
application binaries, they can only make limited assumptions about the ABI of
the applications they're linking to. The exact interface they can rely on is
listed in Fuchsia System Interface, but in general they're only
allowed to use a limited list of symbols, which must all be from either
libc.so
or libzircon.so
. When building ICDs, the imported symbols are
verified against an
allowlist to
ensure that the ICD will be loadable against multiple versions of client
applications. In the future this allowlist may shrink as hermetic replacements
are created.
ICDs need to be able to connect to external protocols; in particular they must
connect to the underlying device drivers that communicate with hardware. They
may also want to read vendor-specific configuration files, as well as log
errors. libc.so
exports several symbols to perform I/O, but in practice the
underlying operations (like open
) are implemented in libfdio
. In addition,
there's no way to connect a Zircon channel using the filesystem without
additional symbols that are directly exported from libfdio
.
To allow ICDs to do limited I/O, these definitions are added to the Vulkan ICD API:
VkResult(VKAPI_PTR* PFN_vkOpenInNamespaceAddr)(const char* pName, uint32_t handle);
VKAPI_ATTR void VKAPI_CALL vk_icdInitializeOpenInNamespaceCallback(PFN_vkOpenInNamespaceAddr
open_in_namespace_addr);
The ICD should expose vk_icdInitializeOpenInNamespaceCallback
. Before any
other driver functions are called, this function will be called with an
open_in_namespace_addr
callback. The ICD can pass a file name and Zircon
channel client end to this callback to connect to filesystem nodes by name.
This function has access to the process's incoming namespace, so the ICD can
read configuration files or connect to services like fuchsia.logger.LogSink
or
fuchsia.tracing.provider.Registry
. Vulkan ICDs may contain global state, so if
the process is a runner that can host multiple child components (perhaps by
using a virtual machine or other non-process mechanism to isolate components),
the runner must ensure that services provided to an ICD are safe to use from any
child component. For example if multiple untrusted child components are
co-located in the process the runner should not route
fuchsia.tracing.provider.Registry
through a child component that the runner
doesn't trust, since the component could snoop on all ICD graphics activity.
The open_in_namespace_addr
callback special-cases access to the
/loader-gpu-devices
path. All access to that path is routed to a filesystem
provided from vulkan_loader
using the
fuchsia.vulkan.loader/Loader.ConnectToDeviceFs
method; this allows the ICD to
connect to whatever hardware-specific device driver nodes it needs. ICDs can use
zxio or raw FIDL to traverse the filesystems; see Filesystem
serving for the details of that filesystem.
Layers are generally distributed through the SDK and loaded from the same package as the application, so they can rely on the same ABI guarantees of any software that's in the SDK. Layers that are loaded through directory capabilities from external packages should be treated the same as ICDs in terms of ABI.
ICD unloading/reloading
It's currently not possible to unload shared libraries, so any ICD will remain
loaded for the lifetime of the process. To avoid memory bloat when creating a
new Vulkan instance, the loader keeps a no-expiration cache of all ICDs it has
seen (identified by shared library filename). This filename is unique for as
long as the vulkan_loader
connection is alive.
Loader execution environment
The Vulkan API doesn't have the notion of an async runloop, so function calls
must complete synchronously from an application's perspective. The loader
doesn't receive an async_dispatcher_t*
from the application, and isn't allowed
to use the default dispatcher from libasync-default.so
. It may create its own
dispatcher and threads internally.
The outgoing directory of a component is hosted by the application's code so the
loader isn't able to put entries in it. This limits how it can interact with
other components. It's also not a platform requirement that the application will
only load a single copy of the loader, though currently all applications use a
copy from libvulkan.so
, which is deduplicated at load time due to its soname.
The loader searches for config files by default in
/vulkan-loader-configuration
, falling back to /pkg/data
. These paths can be
overridden by environment variables or an override layer, the
same as on Linux.
vulkan_loader
vulkan_loader
is a service that is responsible for determining what ICDs are
available, loading them, and serving them to applications. It's hosted at
/core/vulkan_loader
, and the fuchsia.vulkan.loader.Loader
service it exposes is routed to sessions, the test framework, and several
applications. It's written in C++, the code lives at
//src/graphics/bin/vulkan_loader
, and documentation is at
/src/graphics/bin/vulkan_loader/README.md.
In the future this service may be re-written in rust to reduce security risk and take advantage of asynchronous programming features.
Identifying new devices
The vulkan_loader
service must be able to identify what ICDs are usable. This
is driven by the set of device drivers that are running. If a device driver
isn't running for the hardware, then the ICD associated with it isn't usable.
vulkan_loader
uses directory watchers on /dev/class/goldfish-pipe
and
/dev/class/gpu to determine when new graphics devices appear.
When a new graphics device appears, the loader must determine the component associated with the ICD. The exact mechanism depends on the type of device:
/dev/class/gpu
- fuchsia.gpu.magma/Device.GetIcdList is called on the device./dev/class/goldfish-pipe
: The ICD URL is hardcoded to befuchsia-pkg://fuchsia.com/libvulkan_goldfish#meta/vulkan.cm
More types of GPU hardware devices may be supported in the future. Software ICDs
may also be exposed through the loader protocol on some devices as a fallback
(as chosen by vulkan_loader
configuration). Software Vulkan ICDs (such as
SwiftShader) often have JITs and require the ability to write to executable
memory; because of that, they may not be usable on production systems where that
capability is tightly controlled for security reasons.
Filesystem serving
vulkan_loader
serves multiple filesystems to clients, including the manifest
fs and device fs. It creates these filesystems based on the contents of
multiple ICD packages and services it receives through devfs. As a result, they
must be constructed using a filesystem serving library and don't reflect
anything on-disk.
- manifest fs: All manifest JSON files describing all the relevant ICDs; this filesystem looks the same as the /usr/local/share/vulkan/icd.d on Linux, so that minimal changes to the loader are needed.
- device fs: Contains all GPU devices needed by supported Vulkan ICDs. For
/dev/<path>/<node>
device, the filesystem will contain a<path>/<node>
entry.
ICD↔︎loader interface
ICDs are made available to the loader as CFv2 components. An ICD
component must expose a contents
directory containing an arbitrary
directory tree containing a shared library, as well as a metadata
directory
containing a single metadata.json
file.
An ICD is generally contained by itself in a separate package. In that case,
the contents
directory would be the root of the package, and the metadata
directory would be the meta/metadata/
directory in the package. The loader
doesn't enforce this layout, however.
metadata.json
and manifest.json
should ideally be stored under the meta
directory in the package, since that directory is most efficient at
storing small files.
ICD shared libraries
ICD shared libraries should match the Vulkan ICD ABI. ICDs are
executable shared libraries and can be placed in most subdirectories (not
/bin
) of the package.
Component manifest
The Vulkan loader supplies an icd_runner
runner to simplify the creation of
an ICD component from a package. The ICD package must contain a component
manifest .cml
that exports the contents
and metadata
directory capabilities.
The icd_runner
automatically exports /pkg/data
and /pkg/meta/metadata
directories from the ICD package at the /pkg-data
and /pkg-metadata
paths.
These can be used by the CML to export both directory capabilities (using the
subdir
property to expose a subdirectory as a full capability).
The ICD component may also use the ELF runner, but the only service available
to it is fuchsia.logger.LogSink
.
metadata.json
metadata.json is a single JSON file that describes the ICD to the loader. Example:
{
"file_path": "lib/libvulkan_example.so",
"version": 1,
"manifest_path": "meta/icd.d/libvulkan_example.json"
}
version
must be 1 for this metadata version.file_path
is the location of the ICD shared library relative to the exposedcontents
directory.manifest_path
is the location of the Khronos ICD manifest JSON file relative to the exposedcontents
directory.
Other clients
The set of available Vulkan ICDs can change over time; when the system first boots no ICDs will be available until the hardware enumerates. After that, devices may be hotplugged and either appear or disappear.
This means that the list of devices returned by vkEnumeratePhysicalDevices
can
change at any time. Some applications that require Vulkan may want to retry
after the set of available devices changes. They can use a filesystem watcher on
the filesystem returned from fuchsia.vulkan.loader/Loader.ConnectToManifestFs
to determine when to retry.
Implementation
This design represents the current architecture of the Vulkan loader as already implemented on Fuchsia.
Performance
The Vulkan loader is most active at process startup. Once a Vulkan ICD is loaded, it either trampolines Vulkan calls to go into the ICD, or returns ICD function implementations to the application for the application to call directly. As such, its performance is only critical during process startup.
No special consideration has been given to the performance of the loader. It has to launch components to connect to ICDs, and traverse multiple filesystem paths to work out the ICD and layer configuration. At the moment it's not believed that it has any large run time performance impact.
Backwards Compatibility
Communication between libvulkan.so and vulkan_loader
uses filesystems, JSON,
and FIDL. The filesystems and JSON have been in use on Linux for several years
without backwards compatibility issues. There are natural ways of evolving them
(adding paths and keys, respectively) to maintain backwards compatibility. The
FIDL interface is small and can be evolved using FIDL versioning mechanisms.
Security considerations
Components will load shared libraries provided by the Vulkan loader. The
system's normal verified execution enforcement will ensure
that the executable shared library comes from a trustworthy location (e.g. the
filesystem). Any parent component may interpose on the
fuchsia.vulkan.loader.Loader
protocol, so there's no guarantee that the loader
service component sees is provided by the system.
The ICDs chosen to load are referenced by path in the Magma system driver
(MSD) and loaded through a resolver. The full resolver is used by default,
so that can load ephemeral packages. Loading ICDs from ephemeral packages is
useful for developers of ICDs, but shouldn't be necessary for most users.
Loading ephemeral packages can be disabled by disabling the full resolver
(setting the auto_update_packages=false
gn arg). We can also create multiple
core shards for the Vulkan loader that product owners can choose between; eng
builds could choose the shard that uses the full resolver, and user builds could
use the shard with the base resolver.
If needed for specific products, multiple instances of the vulkan_loader
service can be created, each with access to different resolvers. Their
fuchsia.vulkan.loader.Loader
implementations could be routed to client
components based on the clients' security requirements. At the moment no
products have this requirement.
Configuration of the loader may cause unexpected behavior in the application, by loading new layers, preventing the loading of other layers, or setting options on those layers. The component must opt in to taking its configuration from outside its package (by routing a directory capability from outside the package), but otherwise has complete control of loader configuration.
ICD shared libraries are executed in the client process and can execute arbitrary code within that process. The build process and conformance tests will ensure they only import allowlisted symbols, but that isn't a security guarantee and may easily be bypassed by e.g. looking at callstacks to find addresses and parsing executables in memory to find useful gadgets. Applications won't validate most values returned by Vulkan, and may be manipulated into doing arbitrary memory accesses by careful manipulation of those values.
If a runner loads multiple components it doesn't trust into a single process (perhaps by using a virtual machine or other non-process mechanism to isolate components), those components must not be able to make direct Vulkan calls, since there's no known way to validate Vulkan API calls to guarantee that applications don't perform undefined behavior in the Vulkan ICD; even the Vulkan validation layers provide only limited protection. Runner code may make Vulkan calls itself, for example using Skia or ANGLE to execute validated rendering commands on the behalf of a client. Service and device channels provided to the ICD must be from some source the runner trusts, to prevent child components from snooping on each other.
Privacy considerations
The Vulkan loader has minimal privacy effects. The only information exposed over FIDL is whether the application attempts to use Vulkan, and which devices it attempts to use.
Testing
vulkan_loader
and libvulkan.so have unit and integration tests. These tests
are hermetic and don't depend on device drivers or real ICDs installed on the
system.
In addition, there are CTF tests to ensure that the implementation of the
fuchsia.vulkan.loader.Loader
protocol is correct and that the ICDs provided by
it are compatible with old loader versions.
The Vulkan CTS and other Vulkan tests in the fuchsia tree act as end-to-end
tests, checking that the vulkan_loader
is compatible with libvulkan.so
.
These can only run on systems with Vulkan hardware and device drivers.
Documentation
We have vulkan_loader
documentation at
/src/graphics/bin/vulkan_loader/README.md. There is some user
documentation for how to use the Vulkan loader.
The upstream Vulkan Loader has documentation. We should try to add and upstream Fuchsia-specific information to that document.
Prior art and references
The Linux/Windows/MacOS loader.