Quality of service scheduling in SDV

The SDV quality of service (QoS) scheduling framework provides deterministic CPU resource allocation for service bundles managed by the Lifecycle Manager (LM). By abstracting low-level Linux scheduling attributes (Policy, Priority, Nice) into logical presets, the platform enables a clean separation between service development and vehicle-wide system tuning. This helps provide the necessary compute bandwidth for safety-critical and time-sensitive workloads while the system constrains background tasks to prevent instability.

Roles and responsibilities

The SDV scheduling model follows a delegated responsibility pattern to help make sure services remain portable while letting the OEM tune the system.

Role Responsibility Key deliverable
SDV platform (Google) Defines the sdv_service_bundles_scheduling.proto schema, implements the LM enforcement logic, and provides the sdv_service_bundles_scheduling client library. Protobuf definitions and platform libraries
OEM (system integrator) Defines the concrete values for each preset, such as what ELEVATED means on specific hardware, and controls the system-wide configuration. /product/etc/lifecycle_config.textproto
Service developer Selects the appropriate logical preset name and registers the scheduling_config.textproto path in the service bundle manifest. scheduling_config.textproto and sdv_service_bundles_manifest.textproto

Technical workflow

  1. Platform integration: The OEM defines the vehicle-specific lifecycle_config.textproto. This file establishes the system-wide scheduling profiles (presets) and maps them to concrete Linux scheduling attributes based on the target hardware.
  2. Service bundle development: The developer bundles a scheduling_config.textproto within their APEX package, recommending a logical preset (for example, ELEVATED) and defining any internal thread names.
  3. Service bundle integration: During the vehicle integration phase, the OEM reviews the developer's recommended preset. The OEM can keep the recommended profile or override it, such as downgrading an ELEVATED service to NORMAL, to improve overall system stability before signing the APEX.
  4. Runtime attribute resolution: When a service is started, the LM identifies the authorized preset from the service's manifest and retrieves the corresponding Linux attributes from the OEM's system configuration.
  5. Startup synchronization and enforcement: The LM applies the resolved attributes to the service process before signaling it to continue execution using the signal-and-continue protocol.

Core concepts

The following concepts are central to the SDV scheduling framework.

Scheduling presets

Scheduling presets are the primary mechanism for managing QoS in SDV. Instead of defining raw Linux scheduling parameters, developers reference predefined presets by name. At run time, the LM maps these logical names to concrete Linux scheduling attributes.

The following presets configured as an example in lifecycle_config.textproto:

Preset name Scheduling policy Typical nice value Typical use case
NORMAL SCHED_OTHER 0 Standard services (HVAC, media, settings)
ELEVATED SCHED_OTHER -10 Core system components and infrastructure
IDLE SCHED_IDLE 19 Background analytics and non-critical logging
CUSTOM User defined -20 (initial) Services requiring real-time policies or affinity

When applying a preset, the LM uses the setpriority system call to set the process-wide nice value. This affects the relative CPU share the process receives during contention under the default Linux Completely Fair Scheduler (CFS).

Privilege and security

To prevent unauthorized priority escalation, the SDV platform employs a tiered security model based on SELinux domains and Linux capabilities.

Security domains

  • untrusted_service_bundle: The default domain for standard presets (NORMAL, ELEVATED, IDLE). The kernel restricts processes in this domain, preventing them from changing their own scheduling parameters.
  • priority_service_bundle: Granted to any service bundle using a preset where is_privileged: true is set in the system-wide lifecycle_config.textproto. Transitioning to this domain grants the process the CAP_SYS_NICE capability, lets it manage its own resource allocation.

CAP_SYS_NICE capability

The platform grants the CAP_SYS_NICE capability to processes in the priority_service_bundle domain. This permission allows a process to do the following:

  • Elevate its own nice value beyond its initial assignment.
  • Change its scheduling policy to real-time classes like SCHED_FIFO or SCHED_RR.
  • Set CPU affinity using sched_setaffinity.
  • Configure specialized deadline parameters for SCHED_DEADLINE.

Restricting this capability to a dedicated domain protects the system-wide scheduling balance by limiting disruptions to explicitly authorized services.

Configuration guide

This section provides configuration guidance for both OEMs and service developers.

OEM guide: System-wide presets

The OEM defines scheduling profiles in /product/etc/lifecycle_config.textproto. The platform provides an example at lifecycle_management/config/lifecycle_config.textproto that OEMs are expected to adjust based on vehicle hardware.

Example: Define a real-time preset

An OEM might define a CRITICAL preset for safety-related services that must never be preempted by standard applications:

# lifecycle_config.textproto
scheduling_presets {
  name: "CRITICAL"
  is_privileged: true
  thread_scheduling_configuration {
    policy: 1      # SCHED_FIFO
    priority: 80   # High real-time priority
  }
}

Developer guide: Service bundle configuration

Service developers recommend a preset and optionally define logical thread names in a scheduling_config.textproto file. To be discovered by the platform, the path to this file must be registered in sdv_service_bundles_manifest.textproto.

Example: Manifest registration

# sdv_service_bundles_manifest.textproto
service_bundle_entries {
  name: "SensorService"
  scheduling_config_path: "configs/scheduling_config.textproto"
}

Example: Use a standard preset

For most services, referencing a preset name in the scheduling_config.textproto is sufficient:

# configs/scheduling_config.textproto
scheduling_preset_name: "ELEVATED"

Example: Scheduling for service-spawned worker threads

A service developer might create additional worker threads in their code to handle specialized tasks, such as a low-latency sensor data loop. The developer can define named scheduling attributes for these internal threads in the configuration file.

Use manual application only in privileged services that use this metadata to set properties for their internal worker threads:

# configs/scheduling_config.textproto
scheduling_preset_name: "CUSTOM"

# Map of logical thread names to their attributes (propagated as metadata)
thread_scheduling_configuration {
  key: "sensor-processing-thread"
  value {
    policy: 1        # SCHED_FIFO
    priority: 50
    cpu_affinity_ids: [2, 3]  # Pin to specific cores
  }
}

System behavior and enforcement

The SDV platform employs a strict enforcement model to activate scheduling configurations before running any service-specific code.

Startup synchronization

To prevent a service from running at an incorrect priority (even during its initialization phase), the LM and Service Bundle Runner (SBR) use a signal-and-continue synchronization protocol:

  1. Process creation: The LM forks the SBR process. At this point, the SBR child process is running but immediately enters a blocked state, waiting for a signal on its stdin.
  2. Attribute application: While the child is blocked, the LM uses the setpriority system call to apply the requested nice value and configures the scheduling policy (for example, SCHED_IDLE).
  3. Security transition: The LM performs the SELinux domain transition, moving the child into either the untrusted_service_bundle or priority_service_bundle domain.
  4. Signal: Only after all parameters are successfully applied does the LM send a single-byte signal to the child's stdin.
  5. Execution: The SBR receives the signal and begins loading the service libraries and calling lifecycle methods.

Impact on main thread and lifecycle

By synchronizing before loading the service libraries, the platform maintains the requested priority for the main execution thread across all lifecycle callbacks:

  • onCreate: All dependency injection and initial resource allocation occur at the correct priority.
  • onStart: The preset governs the transition to the active state and any initial work loops.
  • onStop and onDestroy: The platform performs cleanup operations at the same priority to prevent starving critical system activities during shutdown.

Binder thread pool and priority inheritance

Threads in the platform-managed Binder thread pool don't execute work with the priority of the thread that created the pool. Instead, for synchronous transactions, the server thread inherits the priority of the caller. The Binder kernel driver controls this priority inheritance mechanism.

Thread-level fine-tuning

Service bundles requiring granular control (for example, real-time processing loops) must manually apply configurations to their worker threads. Use the following steps to apply thread-level configurations:

  1. Request a privileged preset (where is_privileged: true is set).
  2. Use the get_scheduling_configuration function from the sdv_service_bundles_scheduling library to read the thread_scheduling_configuration.
  3. Apply attributes to worker threads using sched_setattr.

SELinux and capabilities enforcement

The platform uses SELinux and the CAP_SYS_NICE Linux capability to enforce scheduling presets:

  • For services using presets like NORMAL or IDLE, scheduling is configured by the LM during startup. These services don't have CAP_SYS_NICE capability and cannot change their priority or policy.
  • Services using a privileged preset (is_privileged: true) are granted CAP_SYS_NICE capability. This allows them to manually control scheduling for their internal threads, as required for real-time tasks.

Verification and samples

Verifying scheduling behavior requires a combination of log analysis and system-level performance measurement. The SDV platform provides a dedicated sample to demonstrate these techniques.

The QoS scheduling sample

Located in samples/qos_scheduling, this sample includes several performance-testing services designed to run in parallel and report their progress.

  • PerformanceTesterNormal: Runs with the NORMAL preset.
  • PerformanceTesterElevated: Runs with the ELEVATED preset.
  • PerformanceTesterRealtime: Uses the CUSTOM preset to apply SCHED_FIFO to its internal worker threads.
  • PolicyOffender: A diagnostic service that attempts to set a real-time priority while using a NORMAL preset. This is used to verify that the platform successfully blocks unauthorized escalation.

Run the verification suite

  1. Enable the QoS-specific orchestration configuration:

    adb shell setprop persist.sdv.orchestrator_config_path /etc/orch/vm_qos_scheduling_orch_config.textproto

  2. Reboot the device to start the services in their respective scheduling domains:

    adb reboot

  3. Filter the logs to see the comparative performance results:

    adb logcat | grep sdv_sample_qos_common

    The PerformanceTesterElevated service reports significantly higher work units completed compared to PerformanceTesterNormal. The EPERM or SELinux denial errors are included in the logs for the PolicyOffender service.

Backward compatibility

  • For backward compatibility, any legacy service bundle that specifies a scheduling_config_path in its manifest but does not specify a scheduling_preset_name in the configuration file is automatically treated as a privileged service. This preserves the necessary capabilities (for example, CAP_SYS_NICE) for older services that rely on manual thread tuning.
  • The root configuration message DeadlineSchedulingConfiguration is scheduled to be renamed in a future release. The current name is historical and no longer accurately reflects that the message handles all scheduling types, not just deadline scheduling.