Skip to content

initramfs: Move UFS and SDHCI storage drivers to initramfs#2286

Open
Kavinaya99 wants to merge 6 commits into
qualcomm-linux:masterfrom
Kavinaya99:configs
Open

initramfs: Move UFS and SDHCI storage drivers to initramfs#2286
Kavinaya99 wants to merge 6 commits into
qualcomm-linux:masterfrom
Kavinaya99:configs

Conversation

@Kavinaya99
Copy link
Copy Markdown
Contributor

Move UFS and SDHCI storage drivers from static kernel build to initramfs modules to improve boot initialization timing

Copy link
Copy Markdown
Contributor

@lumag lumag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe ordering issues and race conditions with other subsystems on some platforms.

Which probe issues? Which race conditions? Which platforms are affected and how? Please be exact.

Also drop the template or AI prompt, describing the changes. It's pretty obvious from the commit itself. Focus on something which is not obvious - reasons, errors, affected devices.

Comment thread recipes-kernel/linux/linux-qcom-6.18/configs/bsp-additions.cfg
initramfs-module-udev \
kernel-module-governor-simpleondemand \
kernel-module-ufshcd-core \
kernel-module-ufshcd-pltfrm \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't those pulled in by module dependencies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without adding these modules to the recipe, we are getting "unable to mount root fs" errors.

kernel-module-ufshcd-core \
kernel-module-ufshcd-pltfrm \
kernel-module-ufs-qcom \
kernel-module-sdhci-msm \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the kernel modules should go into the variable MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS, which is more appropriate for the purpose. In its current form, we will get an error if any of the kernel modules are built-in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving kernel modules to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS ensures they are included in the image when built as modules, but it does not guarantee that they will be loaded during boot.
I have tried using MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS and saw bootup failures as modules were not loaded.
The error: root '/dev/disk/by-partlabel/rootfs' doesn't exist or does not contain a /dev.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving kernel modules to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS ensures they are included in the image when built as modules

Well. No. Packages listed in MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS don't get included into the initramfs-rootfs-image. The recipe overrides PACKAGE_INSTALL (purposedly), so packagegroup-core-boot doesn't get included into the image. That's why you've observed errors.

At the same time, no, you can't list modules here. The image should be generic. Also, it should not fail to build if one changes the kernel config. If you need modules, you need to have a packagegroup which would recommend necessary packages. I'd have said that you should resurrect packagegroup-qcom-boot and initramfs-qcom-image, see commit 05b73a1 ("initramfs-qcom-image: remove the recipe and packagegroup").

Also, this commit should be the first one, otherwise booting of the image would be broken between commits moving the drivers to the modules and this one (and thus breaking git bisect, which is a bad idea).

Convert boot-critical UFS and SDHCI storage drivers from
built-in (=y) to modules (=m) to resolve the device
initialization race conditions on QCS8300 platforms.

On QCS8300 (Monaco), the kernel experiences a race condition
during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU
Clock Controller) all initialize at the same device_initcall
level (level 6). This creates a dependency chain where UFS
requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not
finish initialization before SMMU times out waiting for it.

The race condition manifests as:
gcc-qcs8300 100000.clock-controller: sync_state() pending
due to 3d90000.clock-controller
arm-smmu 3da0000.iommu: deferred probe timeout, ignoring
dependency
arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with
error -110
Kernel panic in iommu_domain_free() at gmu_core_iommu_init()

When all three components probe at the same initialization
level, there is no guaranteed ordering. SMMU's 15-second
deferred probe timeout expires before GPUCC completes,
causing SMMU probe failure, which cascades to UFS failure
and eventual kernel panic during GPU initialization.

Converting UFS-QCOM and SDHCI-MSM drivers to modules moves
their initialization from device_initcall (level 6) to module
loading time, which occurs after all built-in drivers have
completed initialization. This ensures GPUCC and SMMU are
fully initialized and ready before the storage drivers attempt
to probe, eliminating the race condition.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Convert boot-critical UFS and SDHCI storage drivers from
built-in (=y) to modules (=m) to resolve the device
initialization race conditions on QCS8300 platforms.

On QCS8300 (Monaco), the kernel experiences a race condition
during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU
Clock Controller) all initialize at the same device_initcall
level (level 6). This creates a dependency chain where UFS
requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not
finish initialization before SMMU times out waiting for it.

The race condition manifests as:
gcc-qcs8300 100000.clock-controller: sync_state() pending
due to 3d90000.clock-controller
arm-smmu 3da0000.iommu: deferred probe timeout, ignoring
dependency
arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with
error -110
Kernel panic in iommu_domain_free() at gmu_core_iommu_init()

When all three components probe at the same initialization
level, there is no guaranteed ordering. SMMU's 15-second
deferred probe timeout expires before GPUCC completes,
causing SMMU probe failure, which cascades to UFS failure
and eventual kernel panic during GPU initialization.

Converting UFS-QCOM and SDHCI-MSM drivers to modules moves
their initialization from device_initcall (level 6) to module
loading time, which occurs after all built-in drivers have
completed initialization. This ensures GPUCC and SMMU are
fully initialized and ready before the storage drivers attempt
to probe, eliminating the race condition.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add driver configurations as modules (=m) to align with
linux-qcom and linux-qcom-next kernel configurations.

All kernel variants (linux-yocto,linux-yocto-dev,linux-qcom,
linux-qcom-next) share the same ramdisk image. The UFS and
SDHCI drivers have been converted to modules in linux-qcom/
linux-qcom-next to resolve SMMU probe ordering issues on
QCS8300 platforms.

When the storage driver modules are added to PACKAGE_INSTALL
in the ramdisk recipe to support linux-qcom/linux-qcom-next,
the build fails for linux-yocto-dev because these kernels do
not have the corresponding configurations enabled.The package
manager cannot find the module packages that the ramdisk
recipe expects.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add driver configurations as modules (=m) to align with
linux-qcom and linux-qcom-next kernel configurations.

All kernel variants (linux-yocto,linux-yocto-dev,linux-qcom,
linux-qcom-next) share the same ramdisk image. The UFS and
SDHCI drivers have been converted to modules in linux-qcom/
linux-qcom-next to resolve SMMU probe ordering issues on
QCS8300 platforms.

When the storage driver modules are added to PACKAGE_INSTALL
in the ramdisk recipe to support linux-qcom/linux-qcom-next,
the build fails for linux-yocto-dev because these kernels do
not have the corresponding configurations enabled.The package
manager cannot find the module packages that the ramdisk
recipe expects.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add driver configurations as modules (=m) to align with
linux-qcom and linux-qcom-next kernel configurations.

All kernel variants (linux-yocto,linux-yocto-dev,linux-qcom,
linux-qcom-next) share the same ramdisk image. The UFS and
SDHCI drivers have been converted to modules in linux-qcom/
linux-qcom-next to resolve SMMU probe ordering issues on
QCS8300 platforms.

When the storage driver modules are added to PACKAGE_INSTALL
in the ramdisk recipe to support linux-qcom/linux-qcom-next,
the build fails for linux-yocto-dev because these kernels do
not have the corresponding configurations enabled.The package
manager cannot find the module packages that the ramdisk
recipe expects.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add kernel-module-ufs-qcom, kernel-module-ufshcd-core, kernel-
module-ufshcd-pltfrm, kernel-module-sdhci-msm, kernel-module-
governor-simpleondemand to ramdisk PACKAGE_INSTALL to support
booting from UFS/SDHCI storage with modularized storage drivers.

UFS-QCOM and SDHCI-MSM drivers have been converted from built-in
(=y) to modules (=m) in kernel configs to resolve SMMU probe
ordering race conditions on QCS8300 platforms where GPUCC, SMMU
and storage drivers all initialize at the same device_initcall
level.

With storage drivers now built as modules instead of being
compiled into the kernel, they must be present in the ramdisk to
enable the system to access and mount the root filesystem from
UFS or SDHCI storage devices during early boot.

Without these modules in the ramdisk, the kernel cannot probe
storage devices, resulting in boot failure with "unable to
mount root fs" errors.

These modules are loaded during ramdisk initialization, ensuring
storage devices are available before attempting to mount the root
filesystem, maintaining boot functionality while providing proper
driver initialization sequencing.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
@Kavinaya99
Copy link
Copy Markdown
Contributor Author

probe ordering issues and race conditions with other subsystems on some platforms.

Which probe issues? Which race conditions? Which platforms are affected and how? Please be exact.

Also drop the template or AI prompt, describing the changes. It's pretty obvious from the commit itself. Focus on something which is not obvious - reasons, errors, affected devices.

Updated the commit message

Copy link
Copy Markdown
Contributor

@lumag lumag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On QCS8300 (Monaco), the kernel experiences a race condition
during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU
Clock Controller) all initialize at the same device_initcall
level (level 6). This creates a dependency chain where UFS
requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not
finish initialization before SMMU times out waiting for it.

Hmm, no. The GPU SMMU and the UFS SMMU are two different SMMU instances. So, the fact that 3da0000.iommu has not probed should not affect probing of 15000000.iommu and the UFS.

Please continue and find the actual root cause. What exactly is causing UFS to not to probe?

The race condition manifests as:
gcc-qcs8300 100000.clock-controller: sync_state() pending
due to 3d90000.clock-controller
arm-smmu 3da0000.iommu: deferred probe timeout, ignoring
dependency
arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with
error -110
Kernel panic in iommu_domain_free() at gmu_core_iommu_init()

@shashim-quic
Copy link
Copy Markdown

shashim-quic commented May 29, 2026

On QCS8300 (Monaco), the kernel experiences a race condition
during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU
Clock Controller) all initialize at the same device_initcall
level (level 6). This creates a dependency chain where UFS
requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not
finish initialization before SMMU times out waiting for it.

Hmm, no. The GPU SMMU and the UFS SMMU are two different SMMU instances. So, the fact that 3da0000.iommu has not probed should not affect probing of 15000000.iommu and the UFS.

Please continue and find the actual root cause. What exactly is causing UFS to not to probe?

The commit msg need to be re-written. The issue is not GPU SMMU blocking UFS , but actually few configs which are part of static image however their dependencies are configured as modules. This causes repeated probe deferrals sometime delaying ufs bring up and in some other cases exceeding probe deferral timeout that blocks other re-probes.

Few cases from last debug that were identified are below:

  • SMMU configured as 'y' while GPUCC as 'm' leading to repeated probe deferral of adreno smmu
  • USB controller configured as 'y' while usb phy as 'm'

I prefer such modules (which are dependencies for drivers that are part of static kernel image) be moved to initramfs so boot delays (and deferrable timeout) can be better managed.

The race condition manifests as:
gcc-qcs8300 100000.clock-controller: sync_state() pending
due to 3d90000.clock-controller
arm-smmu 3da0000.iommu: deferred probe timeout, ignoring
dependency
arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with
error -110
Kernel panic in iommu_domain_free() at gmu_core_iommu_init()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants