This page provides tips to improve boot time.
Strip debug symbols from modules
Similar to how debug symbols are stripped from the kernel on a production device, make sure you also strip the debug symbols from modules. Stripping debug symbols from modules helps boot time by reducing the following:
- The time it takes to read the binaries from flash.
- The time it takes to decompress the ramdisk.
- The time it takes to load the modules.
Stripping debug symbol from modules may save several seconds during boot.
Symbol stripping is enabled by default in the Android platform build, but
to explicitly enable them, set
BOARD_DO_NOT_STRIP_VENDOR_RAMDISK_MODULES
in your device-specific config
under device/vendor/device.
Use LZ4 compression for kernel and ramdisk
Gzip generates a smaller compressed output compared to LZ4, but LZ4 decompresses faster than Gzip. For the kernel and modules, the absolute storage size reduction from using Gzip isn't that significant compared to the decompression time benefit of LZ4.
Support for LZ4 ramdisk compression has been added to the Android platform
build through BOARD_RAMDISK_USE_LZ4
. You can set this option in your
device-specific config. Kernel compression can be set through kernel defconfig.
Switching to LZ4 should give 500ms to 1000ms faster boot time.
Avoid excessive logging in your drivers
In ARM64 and ARM32, function calls that are more than a specific distance from the call site need a jump table (called a procedure linking table, or PLT) to be able to encode the full jump address. Since modules are loaded dynamically, these jump tables need to be fixed up during module load. The calls that need relocation are called relocation entries with explicit addends (or RELA, for short) entries in the ELF format.
The Linux kernel does some memory size optimization (such as cache hit
optimization) when allocating the PLT. With this upstream
commit,
the optimization scheme has an O(N^2)
complexity, where N
is the number of
RELAs of type R_AARCH64_JUMP26
or R_AARCH64_CALL26
. So having fewer RELAs
of these types is helpful in reducing the module load time.
One common coding pattern that increases the number of
R_AARCH64_CALL26
or R_AARCH64_JUMP26
RELAs is excessive logging in a
driver. Each call to printk()
or any other logging scheme typically adds a
CALL26
/JUMP26
RELA entry. In the commit text in the upstream
commit,
,notice that even with the optimization, the six modules take about 250ms
to load—that is because those six modules were the top six modules with
the most amount of logging.
Reducing logging can save can save about 100 - 300ms on boot times depending on how excessive the existing logging is.
Enable asynchronous probing, selectively
When a module is loaded, if the device that it supports has already been
populated from the DT (devicetree) and added to driver core, then the device
probe is done in the context of the module_init()
call. When a device probe is
done in the context of module_init()
, the module can't finish loading until
the probe completes. Since module loading is mostly serialized, a device that
takes a relatively long time to probe slows the boot time.
To avoid slower boot times, enable asynchronous probing for modules that take a while to probe their devices. Enabling asynchronous probing for all modules might not be beneficial as the time it takes to fork a thread and kick off the probe might be as high as the time it takes to probe the device.
Devices that are connected through a slow bus such as I2C, devices that do firmware loading in their probe function, and devices that do a lot of hardware initialization can lead to the timing issue. The best way to identify when this happens is to collect the probe time for every driver and sort it.
To enable asynchronous probing for a module, it isn't sufficient to only
set the PROBE_PREFER_ASYNCHRONOUS
flag in the driver code. For modules, you also need to add
module_name.async_probe=1
in the kernel command line
or pass async_probe=1
as a module parameter when loading the module using
modprobe
or insmod
.
Enabling asynchronous probing can save about 100 - 500ms on boot times depending on your hardware/drivers.
Probe your CPUfreq driver as early as possible
The earlier your CPUfreq driver probes, the sooner you can scale the CPU
frequency to maximum (or some thermally limited maximum) during boot. The
faster the CPU, the faster the boot. This guideline also applies to devfreq
drivers that control the DRAM, memory, and interconnect frequency.
With modules, the load ordering can depend on the initcall
level and
compile or link order of the drivers. Use an alias MODULE_SOFTDEP()
to make
sure the cpufreq
driver is among the first few modules to load.
Apart from loading the module early, you also need to make sure all the dependencies to probe the CPUfreq driver have also probed. For example, if you need a clock or regulator handle to control the frequency of your CPU, make sure they are probed first. Or you might need thermal drivers to be loaded before the CPUfreq driver if it is possible for your CPUs to get too hot during boot up. So, do what you can to make sure the CPUfreq and relevant devfreq drivers probe as early as possible.
The savings from probing your CPUfreq driver early can be very small to very large depending on how early you can get these to probe and at what frequency the bootloader leaves the CPUs in.
Move modules to second stage init, vendor or vendor_dlkm partition
Because the first stage init process is serialized, there aren't many
opportunities to parallelize the boot process. If a module isn't needed for
first stage init to finish, move the module to second stage init by placing it
in the vendor or vendor_dlkm
partition.
First stage init doesn't require probing several devices to get to second stage init. Only console and flash storage capabilities are needed for a normal boot flow.
Load the following essential drivers:
watchdog
reset
cpufreq
For recovery and user space fastbootd
mode, first stage init requires more
devices to probe (such as USB), and display. Keep a copy of these modules in the
first stage ramdisk and in the vendor or vendor_dlkm
partition. This lets them
be loaded in first stage init for recovery or fastbootd
boot flow. However,
don't load the recovery mode modules in first stage init during normal boot
flow. Recovery mode modules can be deferred to second stage init to decrease the
boot time. All other modules that aren't needed in first stage init should be
moved to the vendor or vendor_dlkm
partition.
Given a list of leaf devices (for example, the UFS or serial),
dev needs.sh
script finds all drivers, devices, and modules needed for dependencies or
suppliers (for example, clocks, regulators, or gpio
) to probe.
Moving modules to second stage init decreases boot times in the following ways:
- Ramdisk size reduction.
- This yields faster flash reads when the bootloader loads the ramdisk (serialized boot step).
- This yields faster decompression speeds when the kernel decompresses the ramdisk (serialized boot step).
- Second stage init works in parallel, which hides the module's loading time with the work being done in second stage init.
Moving modules to second stage can save 500 - 1000ms on boot times depending on how many modules you're able to move to second stage init.
Module loading logistics
The latest Android build features board configurations that control which modules copy over to each stage, and which modules load. This section focuses on the following subset:
BOARD_VENDOR_RAMDISK_KERNEL_MODULES
. This list of modules to be copied into the ramdisk.BOARD_VENDOR_RAMDISK_KERNEL_MODULES_LOAD
. This list of modules to be loaded in first stage init.BOARD_VENDOR_RAMDISK_RECOVERY_KERNEL_MODULES_LOAD
. This list of modules to be loaded when recovery orfastbootd
is selected from the ramdisk.BOARD_VENDOR_KERNEL_MODULES
. This list of modules to be copied into the vendor orvendor_dlkm
partition at/vendor/lib/modules/
directory.BOARD_VENDOR_KERNEL_MODULES_LOAD
. This list of modules to be loaded in second stage init.
The boot and recovery modules in ramdisk must also be copied to the vendor or
vendor_dlkm
partition at /vendor/lib/modules
. Copying these modules to the
vendor partition ensures the modules aren't invisible during second stage init,
which is useful for debugging and collecting modinfo
for bugreports.
The duplication should cost minimal space on the vendor or vendor_dlkm
partition
as long as the boot module set is minimized. Make sure that the vendor's
modules.list
file has a filtered list of modules in /vendor/lib/modules
.
The filtered list ensures boot times aren't affected by the modules loading
again (which is an expensive process).
Ensure that recovery mode modules load as a group. Loading recovery mode modules can be done either in recovery mode, or at the beginning of the second stage init in each boot flow.
You can use the device Board.Config.mk
files to perform these actions as seen
in the following example:
# All kernel modules
KERNEL_MODULES := $(wildcard $(KERNEL_MODULE_DIR)/*.ko)
KERNEL_MODULES_LOAD := $(strip $(shell cat $(KERNEL_MODULE_DIR)/modules.load)
# First stage ramdisk modules
BOOT_KERNEL_MODULES_FILTER := $(foreach m,$(BOOT_KERNEL_MODULES),%/$(m))
# Recovery ramdisk modules
RECOVERY_KERNEL_MODULES_FILTER := $(foreach m,$(RECOVERY_KERNEL_MODULES),%/$(m))
BOARD_VENDOR_RAMDISK_KERNEL_MODULES += \
$(filter $(BOOT_KERNEL_MODULES_FILTER) \
$(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES))
# ALL modules land in /vendor/lib/modules so they could be rmmod/insmod'd,
# and modules.list actually limits us to the ones we intend to load.
BOARD_VENDOR_KERNEL_MODULES := $(KERNEL_MODULES)
# To limit /vendor/lib/modules to just the ones loaded, use:
# BOARD_VENDOR_KERNEL_MODULES := $(filter-out \
# $(BOOT_KERNEL_MODULES_FILTER),$(KERNEL_MODULES))
# Group set of /vendor/lib/modules loading order to recovery modules first,
# then remainder, subtracting both recovery and boot modules which are loaded
# already.
BOARD_VENDOR_KERNEL_MODULES_LOAD := \
$(filter-out $(BOOT_KERNEL_MODULES_FILTER), \
$(filter $(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD)))
BOARD_VENDOR_KERNEL_MODULES_LOAD += \
$(filter-out $(BOOT_KERNEL_MODULES_FILTER) \
$(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD))
# NB: Load order governed by modules.load and not by $(BOOT_KERNEL_MODULES)
BOARD_VENDOR_RAMDISK_KERNEL_MODULES_LOAD := \
$(filter $(BOOT_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD))
# Group set of /vendor/lib/modules loading order to boot modules first,
# then the remainder of recovery modules.
BOARD_VENDOR_RAMDISK_RECOVERY_KERNEL_MODULES_LOAD := \
$(filter $(BOOT_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD))
BOARD_VENDOR_RAMDISK_RECOVERY_KERNEL_MODULES_LOAD += \
$(filter-out $(BOOT_KERNEL_MODULES_FILTER), \
$(filter $(RECOVERY_KERNEL_MODULES_FILTER),$(KERNEL_MODULES_LOAD)))
This example showcases an easier-to-manage subset of BOOT_KERNEL_MODULES
and
RECOVERY_KERNEL_MODULES
to be specified locally in the board configuration
files. The preceding script finds and fills each of the subset modules from the
selected available kernel modules, leaving the reamining modules for second
stage init.
For second stage init, we recommend running the module loading as a service so it doesn't block boot flow. Use a shell script to manage the module loading so that other logistics, such as error handling and mitigation, or module load completion, can be reported back (or ignored) if necessary.
You can ignore a debug module load failure that isn't present on user builds.
To ignore this failure, set the vendor.device.modules.ready
property to
trigger later stages of init rc
scripting bootflow to continue onto the launch
screen. Reference the following example script, if you have the following code
in /vendor/etc/init.insmod.sh
:
#!/vendor/bin/sh
. . .
if [ $# -eq 1 ]; then
cfg_file=$1
else
# Set property even if there is no insmod config
# to unblock early-boot trigger
setprop vendor.common.modules.ready
setprop vendor.device.modules.ready
exit 1
fi
if [ -f $cfg_file ]; then
while IFS="|" read -r action arg
do
case $action in
"insmod") insmod $arg ;;
"setprop") setprop $arg 1 ;;
"enable") echo 1 > $arg ;;
"modprobe") modprobe -a -d /vendor/lib/modules $arg ;;
. . .
esac
done < $cfg_file
fi
In the hardware rc file, the one shot
service could be specified with:
service insmod-sh /vendor/etc/init.insmod.sh /vendor/etc/init.insmod.<hw>.cfg
class main
user root
group root system
Disabled
oneshot
Additional optimizations can be made after modules move from the first to second stage. You can use the modprobe blocklist feature to split up the second stage boot flow to include deferred module loading of nonessential modules. Loading of modules used exclusively by a specific HAL can be deferred to load the modules only when the HAL is started.
To improve apparent boot times, you can specifically choose modules in the
module loading service that are more conducive to loading after the launch
screen. For example, you can explicitly late load the modules for
video decoder or Wi-Fi after the init boot flow has been cleared
(sys.boot_complete
Android property signal, for example). Make sure the HALs for the late loading
modules block long enough when the kernel drivers aren't present.
Alternatively, you can use init's wait<file>[<timeout>]
command in the boot
flow rc scripting to wait for select sysfs
entries to show that driver modules
have completed the probe operations. An example of this is waiting for the
display driver to complete loading in the background of recovery or fastbootd
,
before presenting menu graphics.
Initialize the CPU frequency to a reasonable value in the bootloader
Not all SoCs/products might be able to boot the CPU at the highest frequency due to thermal or power concerns during boot loop tests. However, make sure the bootloader sets the frequency of all the online CPUs to as high as safely possible for a SoC or product. This is very important because, with a fully modular kernel, the init ramdisk decompression takes place before the CPUfreq driver can be loaded. So, if the CPU is left at the lower end of its frequency by the bootloader, the ramdisk decompression time can take longer than a statically compiled kernel (after adjusting for ramdisk size difference) because the CPU frequency would be very low when doing CPU intensive work (decompression). The same applies to memory and interconnect frequency.
Initialize CPU frequency of big CPUs in the bootloader
Before the CPUfreq
driver is loaded, the kernel is unaware of the
CPU frequencies and doesn't scale the CPU sched capacity for their current
frequency. The kernel might migrate threads to the big CPU if the load is
sufficiently high on the little CPU.
Make sure the big CPUs are at least as performant as the little CPUs for the frequency at which the bootloader leaves them in. For example, if the big CPU is 2x as performant as the little CPU for the same frequency, but the bootloader sets the little CPU's frequency to 1.5 GHz and the big CPU's frequency to 300 MHz, then the boot performance is going to drop if the kernel moves a thread to the big CPU. In this example, if it is safe to boot the big CPU at 750 MHz, you should do so even if you don't plan to explicitly use it.
Drivers shouldn't load firmware in first stage init
There might be some unavoidable cases where firmware needs to be loaded in first stage init. But in general, drivers shouldn't load any firmware in first stage init, especially in device probe context. Loading firmware in first stage init causes the entire boot process to stall if the firmware isn't available in the first stage ramdisk. And even if the firmware is present in the first stage ramdisk, it still causes an unnecessary delay.