OTA updates

Over-The-Air updates (OTAs) are a mechanism for operating system updates on Fuchsia. This document details how OTA updates work on Fuchsia.

The update process is divided into the following phases:

Checking for an update

The two entry points for the operating system update process are the omaha-client and the system-update-checker components.

Both the omaha-client and system-update-checker serve the same purpose, to find out if there is an operating system update and start the update.

Generally, products should use omaha-client if they want to use Omaha to determine update availability. Products should use system-update-checker if they don’t want to use Omaha and instead want to check for updates directly from package repositories.

On any given Fuchsia system, only one of these components may be running:

Update checks with omaha-client

During the boot process, omaha-client starts up and begins periodic update checks. During these checks, omaha-client polls the Omaha server to check for updates.

The benefits of using Omaha are:

  • It allows for a fractional rollout of system updates across a fleet of Fuchsia devices. For example, it can be configured that only 10% of the fleet of devices gets updated. This means that only 10% of these devices will see that there is an available update while polling Omaha. The remaining 90% of devices would not see an update available.
  • It allows for different update channels. For example, test devices can get updates from the test channel and get the newest (possibly unstable) software. This allows production devices to get updates from the production channel and get the most stable software. Channel information can be optionally given to Omaha along with product and version.

Figure: Checking for updates with omaha-client

Figure 1. A simplified version of the update check process with omaha-client. There are policies that gate whether omaha-client can check for an update or apply an update.

Once omaha-client gets the update package URL from the Omaha server, omaha-client tells the system-updater to start an update.

Update checks with system-update-checker

Devices that don’t use omaha-client use the system-update-checker. Depending on how it is configured, the system-update-checker regularly polls for an update package. These checks default to disabled if no auto_update is specified.

To check if an update is available, the system-update-checker checks the following conditions:

  • Is the hash of the currently running system image (located in /pkgfs/system/meta) different from the hash of system image (found in packages.json) in the update package?
  • If the system image isn’t different, is the vbmeta that’s currently running on the system different from the vbmeta of the update package?
  • If there is no vbmeta, is the ZBI that’s currently running on the system different from the ZBI of the update package?

If any of these answers are yes, then the system-update-checker knows the update package has changed. Once the system-update-checker realizes the update package has changed, the system-update-checker triggers the system-updater to start an update using the default update package (fuchsia-pkg://fuchsia.com/update).

Figure: Checking for updates with the system-update-checker

Figure 2. A simplified version of the update check process with the system-update-checker.

If no update is required, the update checker saves the last known update package it saw on the server. On subsequent checks for an update, the hash of the update package that is fetched is checked against the last known hash on the server. If the hash of the latest update package has changed since the last check, the update checker compares the vbmeta and ZBI on the running system against the respective images in the update package. If either the vbmeta or ZBI are different between the running image and the update package, the checker kicks off a system update.

Monitoring

If a client is interested in monitoring update progress and status, they could implement fuchsia.update.AttemptsMonitor protocol and provide the client end to MonitorAllUpdateChecks() method of fuchsia.update.Manager FIDL protocol. fuchsia.update.AttemptsMonitor instance will only receive messages when an update is started by another method, or if an update is currently in progress. This will not trigger a new update.

fuchsia.update.AttemptsMonitor instance will receive OnStart message which will contain server end to the fuchsia.update.Monitor protocol. This allows client to receive and process OnState messages, informing about the update state changes.

Another option is to implement fuchsia.update.Monitor and provide the client end to CheckNow() method of the fuchsia.update.Manager protocol. This will start checking for an update. It will only monitor the update that's currently running and will close the handle once the update completes.

Staging an update

Regardless of whether an update was triggered by omaha-client, system-update-checker, or even a forced update check, an update needs to be written to disk.

The update process is divided into the following steps:

Figure: Starting state diagram

Figure 3. The device is currently running hypothetical OS version 1 (on slot A) and begins to update to hypothetical OS version 2 (to slot B). Warning: this may not be how the disk is partitioned in practice.

Fetch update package

The system-updater fetches the update package, using the provided update package URL. The dynamic index is then updated to reference the new update package. A sample update package may look like this:

/board
/epoch.json
/firmware
/fuchsia.vbmeta
/packages.json
/recovery.vbmeta
/version
/zbi.signed
/zedboot.signed
/meta/contents
/meta/package

If the fetch fails because there's not enough space, the system-updater will trigger garbage collection to delete all BLOBs that aren’t referenced in either the static or dynamic indexes or the retained packages set. After garbage collection, the system-updater will retry the fetch. If the retry fails, the system-updater will replace the retained packages set with just the update package it is trying to fetch (if the update package URL included the hash, otherwise it will clear the retained package set) and then again trigger garbage collection and retry the update package fetch.

Figure: Fetch update package

Figure 4. The system-updater instructs the pkg-resolver to resolve the version 2 update package. We assume the system-updater failed to fetch the update package because of inadequate space, triggered a garbage collection to evict the version 0 blobs referenced by slot B, and then retried to successfully fetch the version 2 update package.

Optionally, the update package may contain an update-mode file. This file determines whether the system update happens in Normal or ForceRecovery mode. If the update-mode file is not present, the system-updater defaults to the Normal mode.

When the mode is ForceRecovery, the system-updater writes an image to recovery, marks slots A and B as unbootable, then boots to recovery. For more information, see the implementation of ForceRecovery.

Verify board matches

The current running system has a board file located in /config/build-info/board. The system-updater verifies that the board file on the system matches the board file in the update package.

Figure: Verify board matches

Figure 5. The system-updater verifies the board in the update package matches the board on slot A.

Verify epoch is supported

The update package contains an epoch file (epoch.json). If the epoch of the update package (the target epoch) is less than the epoch of the system-updater (the source epoch), the OTA fails. For additional context, see RFC-0071.

Figure: Verify epoch is supported

Figure 6. The system-updater verifies the epoch in the update package is supported by comparing it to the epoch of the current OS.

Replace retained packages set

Replace the retained packages set with the current update package and all of the packages that will be fetched later in the OTA process.

The retained packages set is a set of packages that are protected from garbage collection (in addition to the packages in the static and dynamic indexes). It is used to prevent garbage collection from deleting BLOBs needed by the current update process. For example, consider a device that fetched some of the packages needed for an update and then rebooted for unrelated reasons. When the device starts to OTA again, it still needs the packages it fetched before rebooting, but those packages are not protected by the dynamic index (which, like the retained packages set, is cleared on reboot). By adding those packages to the retained packages set, the system-updater can then trigger garbage collection (to e.g. remove blobs used by a previous system version) without undoing past work.

Trigger garbage collection

Garbage collection is triggered to delete all BLOBs exclusive to the old system. This step frees up additional space for any new packages.

Figure: Garbage collection

Figure 7. The system-updater instructs pkg-cache to garbage collect all BLOBs exclusive to the old system. In this example, it means pkg-cache will evict BLOBs exclusively referenced by the version 1 update package.

Fetch remaining packages

The system-updater parses the packages.json file in the update package. The packages.json looks like the following:

{
  "version": “1”,
  "content": [
    "fuchsia-pkg://fuchsia.com/sshd-host/0?hash=123..abc",
    "fuchsia-pkg://fuchsia.com/system-image/0?hash=456..def"
    ...
  ]
}

The system-updater instructs the pkg-resolver to resolve all the package URLs. When resolving packages, the package management system only fetches BLOBs that are required for an update, i.e. only those BLOBs that aren't already present. The package management system fetches entire BLOBs, as opposed to a diff of whatever might currently be on the system.

Once all packages have been fetched, a BlobFS sync is triggered to flush the BLOBs to persistent storage. This process ensures that all the necessary BLOBs for the system update are available in BlobFS.

Figure: Fetch remaining packages

Figure 8. The system-updater instructs the pkg-resolver to resolve the version 2 packages referenced in packages.json.

Write images to block device

The system-updater determines which images need to be written to the block device. There are two kinds of images, assets and firmware.

Then, the system-updater instructs the paver to write the bootloader and firmware. The final location of these images does not depend on whether the device supports ABR . To prevent flash wear, the image is only written to a partition if the image is different from the image that already exists on the block device.

Then, the system-updater instructs the paver to write the Fuchsia ZBI and its vbmeta. The final location of these images depends on whether the device supports ABR . If the device supports ABR , the paver writes the Fuchsia ZBI and its vbmeta to the slot that’s not currently booted (the alternate slot). Otherwise, the paver writes them to both the A and B partitions (if a B partition exists).

Finally, the system-updater instructs the paver to write the recovery ZBI and its vbmeta. Like the bootloader and firmware, the final location does not depend on if the device supports ABR .

Figure: Write images to block device

Figure 9. The system-updater writes the version 2 images to slot B via the paver.

Set alternate partition as active

If the device supports ABR, the system-updater uses the paver to set the alternate partition as active. That way, the device boots into the alternate partition on the next boot.

There are a several ways to refer to the slot state. For example, the internal paver uses Successful while the FIDL service uses Healthy, while other cases may use Active, Inactive, Bootable, Unbootable, Current, Alternate, etc...

The important metadata is 3 pieces of information that is stored for each kernel slot. This information helps determine the state of each kernel slot. For example, before slot B is marked as active, the metadata might look like:

Metadata Slot A Slot B
Priority 15 0
Tries Remaining 0 0
Healthy* 1 0

After slot B is marked as active, the metadata would look like:

Metadata Slot A Slot B
Priority 14 15**
Tries Remaining 0 7**
Healthy 1 0

If the device doesn’t support ABR, this check is skipped since there is no alternate partition. Instead, there is an active partition that is written to for every update.

Figure: Set alternate partition as active

Figure 10. The system-updater sets slot B to Active, so that the device boots into slot B on the next boot.

Reboot

Depending on the update configuration, the device may or may not reboot. After the device reboots, the device boots into the new slot.

Figure: Reboot

Figure 11. The device reboots into slot B and begins running version 2.

Verifying an update

The system commits an update once that update is verified by the system.

The system verifies the update in the following way:

Rebooting into the updated version

On the next boot, the bootloader needs to determine which slot to boot into. In this example, the bootloader determines to boot into slot B because slot B has a higher priority and more than 0 tries remaining (see Set alternate partition as active). Then, the bootloader verifies the ZBI of B matches the vbmeta of B, and finally boots into slot B.

After early boot, fshost launches pkgfs using the new system-image package. This is the system image package that is referenced in the packages.json while staging the update. The system-image package has a static_packages file in it that lists the base packages for the new system. For example:

pkg-resolver/0 = new-version-hash-pkg-resolver
foo/0 = new-version-hash-foo
bar/0 = new-version-hash-bar
...
// Note the system-image package is not referenced in static_packages
// because it's impossible for it to refer to its own hash.

pkgfs then loads all these packages as base packages. The packages appear in /pkgfs/{packages, versions}, which indicate that the packages are installed or activated. Then, the system starts the pkg-resolver, pkg-cache, netstack, etc...

Committing the update

The system-update-committer component runs various checks to verify if the new update was successful. For example, it instructs BlobFs to arbitrarily read 1MiB of data. If the system is already committed on boot, these checks are skipped. If the check fails and depending on how the system is configured, the system-update-committer may trigger a reboot.

After the update is verified, the current partition (slot B) is marked as Healthy. Using the example described in Set alternate partition as active, the boot metadata may now look like:

Metadata Slot A Slot B
Priority 14 15
Tries Remaining 7 0
Healthy 0 1

Then, the alternate partition (slot A) is marked as unbootable. Now, the boot metadata may look like:

Metadata Slot A Slot B
Priority 0 15
Tries Remaining 0 0
Healthy 0 1

After this, the update is considered committed. This means:

  • The system always boots into slot B until the next system update.
  • The system gives up booting into slot A until the next system update overwrites slot A.
  • The BLOBs referenced by slot A are now able to be garbage collected.
  • Subsequent system updates are now allowed. When the update checker discovers a new update, the whole update process starts again.