Zircon scheduler

Overview

The Zircon scheduler is a hybrid scheduler that supports both Fair and Deadline scheduling disciplines. It runs independently on each logical CPU, managing its own set of run queues and coordinating with other CPUs via Inter-Processor Interrupts (IPIs).

The primary goal of the scheduler is to share CPU bandwidth among competing threads while guaranteeing the timing requirements of critical workloads (such as audio, graphics, and high-frequency sensors).

Scheduling disciplines

Fair scheduling

Fair scheduling is the primary scheduling discipline for general-purpose workloads in the system. It divides CPU bandwidth between competing threads such that each receives a weighted proportion of the CPU over time.

Weight-Based Allocation: Each thread is assigned a weight. A thread with twice the weight of another receives approximately twice the CPU time.
Virtual Timeline: Ready threads are ordered in a balanced binary tree (WAVL tree) based on a virtual finish time. Higher-weighted threads are ordered closer to the front, but lower-weighted threads are guaranteed timely execution to prevent starvation and unbounded priority inversion.
Scheduling Period: The scheduling period adapts to the number of competing threads. If too many threads compete, the period stretches to ensure each gets at least a Minimum Granularity slice, improving throughput.

For a detailed look at the Fair Scheduling algorithm math and ordering criteria, see Fair Scheduler Mechanics.

Deadline scheduling

Deadline scheduling is supported for latency-critical tasks that require firm timing guarantees and must not partake in proportional slowdowns during overload conditions.

Guarantees: A deadline task specifies a Capacity (CPU time) and a Deadline (period). The scheduler guarantees that a task will receive its capacity within its deadline period if the overall deadline demand is feasible.
Precedence: Deadline tasks always take precedence over eligible fair tasks.
Overload Response: To ensure that all deadline tasks can be scheduled within their expected periods, the sum total of deadline demands must not exceed the processor's capacity. If the total deadline demand of a processor exceeds 100%, stochastic deadline misses are expected and the system will behave as though the periods of some or all task were multiplied by the overload factor.
Critical Priority: Deadline profiles can optionally be marked as critical. A critical deadline task receives absolute priority over normal deadline tasks during periods of CPU oversubscription, ensuring that essential workloads are not degraded by runaway standard deadline threads. In a correctly functioning system, total critical utilization should never exceed the limits of the processor, otherwise a kernel oops is emitted.

Run queues and selection

Each CPU maintains three primary run queues, implemented as augmented WAVL trees to satisfy efficient O(log n) operations:

Critical Deadline Run Queue: Contains runnable critical deadline tasks.
Deadline Run Queue: Contains runnable non-critical deadline tasks.
Fair Run Queue: Contains runnable fair tasks.

Thread selection

When choosing the next thread to run, the scheduler evaluates queues in the following order:

Critical Deadline Task: Picks the eligible critical deadline task with the earliest finish time.
Deadline Task: Picks the eligible deadline task with the earliest finish time.
Fair Task: Picks the eligible fair task with the earliest virtual finish time.
Idle/Power Thread: Runs when no other threads are eligible.

Preemption and timeslices

The scheduler sets a CPU-local preemption timer based on the selected thread's timeslice or the arrival of a deadline task. When the timer fires, execution stops on the current thread, and the scheduler re-evaluates which thread is the most suitable to run next.

CPU placement and migration

The scheduler performs load balancing and task placement based on several factors to optimize performance and thermal efficiency.

Placement criteria

When a thread wakes up or is unblocked, the scheduler selects a target CPU based on:

Affinity: The thread's user-defined CPU affinity mask is prioritized.
Last CPU: Running on the last CPU helps preserve cache warmness (Intra-cluster affinity).
Idle States: Highly prioritized if the core avoids context-switch overhead of a busy CPU.

Work stealing

To maintain load balance, a CPU with no eligible work will attempt to steal work from other busy CPUs.

Cluster Awareness: The scheduler is conscious of the CPU topology. CPU stealing prioritizes local clusters over distant clusters to capitalize on better shared cache performance levels and reduce migration penalties.

Power and energy awareness

The scheduler uses local energy models and power level controls to optimize power consumption and thermal performance.

Performance Scales: Account for individual CPU performance multipliers inside budget calculations.
Dynamic Voltage and Frequency Scaling (DVFS): Adjust processor workloads supporting user-defined limits while striving to meet deadline schedules.