initramfs: Move UFS and SDHCI storage drivers to initramfs#2286
initramfs: Move UFS and SDHCI storage drivers to initramfs#2286Kavinaya99 wants to merge 6 commits into
Conversation
lumag
left a comment
There was a problem hiding this comment.
probe ordering issues and race conditions with other subsystems on some platforms.
Which probe issues? Which race conditions? Which platforms are affected and how? Please be exact.
Also drop the template or AI prompt, describing the changes. It's pretty obvious from the commit itself. Focus on something which is not obvious - reasons, errors, affected devices.
| initramfs-module-udev \ | ||
| kernel-module-governor-simpleondemand \ | ||
| kernel-module-ufshcd-core \ | ||
| kernel-module-ufshcd-pltfrm \ |
There was a problem hiding this comment.
Aren't those pulled in by module dependencies?
There was a problem hiding this comment.
Without adding these modules to the recipe, we are getting "unable to mount root fs" errors.
| kernel-module-ufshcd-core \ | ||
| kernel-module-ufshcd-pltfrm \ | ||
| kernel-module-ufs-qcom \ | ||
| kernel-module-sdhci-msm \ |
There was a problem hiding this comment.
All of the kernel modules should go into the variable MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS, which is more appropriate for the purpose. In its current form, we will get an error if any of the kernel modules are built-in.
There was a problem hiding this comment.
Moving kernel modules to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS ensures they are included in the image when built as modules, but it does not guarantee that they will be loaded during boot.
I have tried using MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS and saw bootup failures as modules were not loaded.
The error: root '/dev/disk/by-partlabel/rootfs' doesn't exist or does not contain a /dev.
There was a problem hiding this comment.
Moving kernel modules to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS ensures they are included in the image when built as modules
Well. No. Packages listed in MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS don't get included into the initramfs-rootfs-image. The recipe overrides PACKAGE_INSTALL (purposedly), so packagegroup-core-boot doesn't get included into the image. That's why you've observed errors.
At the same time, no, you can't list modules here. The image should be generic. Also, it should not fail to build if one changes the kernel config. If you need modules, you need to have a packagegroup which would recommend necessary packages. I'd have said that you should resurrect packagegroup-qcom-boot and initramfs-qcom-image, see commit 05b73a1 ("initramfs-qcom-image: remove the recipe and packagegroup").
Also, this commit should be the first one, otherwise booting of the image would be broken between commits moving the drivers to the modules and this one (and thus breaking git bisect, which is a bad idea).
Convert boot-critical UFS and SDHCI storage drivers from built-in (=y) to modules (=m) to resolve the device initialization race conditions on QCS8300 platforms. On QCS8300 (Monaco), the kernel experiences a race condition during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU Clock Controller) all initialize at the same device_initcall level (level 6). This creates a dependency chain where UFS requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not finish initialization before SMMU times out waiting for it. The race condition manifests as: gcc-qcs8300 100000.clock-controller: sync_state() pending due to 3d90000.clock-controller arm-smmu 3da0000.iommu: deferred probe timeout, ignoring dependency arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with error -110 Kernel panic in iommu_domain_free() at gmu_core_iommu_init() When all three components probe at the same initialization level, there is no guaranteed ordering. SMMU's 15-second deferred probe timeout expires before GPUCC completes, causing SMMU probe failure, which cascades to UFS failure and eventual kernel panic during GPU initialization. Converting UFS-QCOM and SDHCI-MSM drivers to modules moves their initialization from device_initcall (level 6) to module loading time, which occurs after all built-in drivers have completed initialization. This ensures GPUCC and SMMU are fully initialized and ready before the storage drivers attempt to probe, eliminating the race condition. Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Convert boot-critical UFS and SDHCI storage drivers from built-in (=y) to modules (=m) to resolve the device initialization race conditions on QCS8300 platforms. On QCS8300 (Monaco), the kernel experiences a race condition during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU Clock Controller) all initialize at the same device_initcall level (level 6). This creates a dependency chain where UFS requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not finish initialization before SMMU times out waiting for it. The race condition manifests as: gcc-qcs8300 100000.clock-controller: sync_state() pending due to 3d90000.clock-controller arm-smmu 3da0000.iommu: deferred probe timeout, ignoring dependency arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with error -110 Kernel panic in iommu_domain_free() at gmu_core_iommu_init() When all three components probe at the same initialization level, there is no guaranteed ordering. SMMU's 15-second deferred probe timeout expires before GPUCC completes, causing SMMU probe failure, which cascades to UFS failure and eventual kernel panic during GPU initialization. Converting UFS-QCOM and SDHCI-MSM drivers to modules moves their initialization from device_initcall (level 6) to module loading time, which occurs after all built-in drivers have completed initialization. This ensures GPUCC and SMMU are fully initialized and ready before the storage drivers attempt to probe, eliminating the race condition. Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add driver configurations as modules (=m) to align with linux-qcom and linux-qcom-next kernel configurations. All kernel variants (linux-yocto,linux-yocto-dev,linux-qcom, linux-qcom-next) share the same ramdisk image. The UFS and SDHCI drivers have been converted to modules in linux-qcom/ linux-qcom-next to resolve SMMU probe ordering issues on QCS8300 platforms. When the storage driver modules are added to PACKAGE_INSTALL in the ramdisk recipe to support linux-qcom/linux-qcom-next, the build fails for linux-yocto-dev because these kernels do not have the corresponding configurations enabled.The package manager cannot find the module packages that the ramdisk recipe expects. Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add driver configurations as modules (=m) to align with linux-qcom and linux-qcom-next kernel configurations. All kernel variants (linux-yocto,linux-yocto-dev,linux-qcom, linux-qcom-next) share the same ramdisk image. The UFS and SDHCI drivers have been converted to modules in linux-qcom/ linux-qcom-next to resolve SMMU probe ordering issues on QCS8300 platforms. When the storage driver modules are added to PACKAGE_INSTALL in the ramdisk recipe to support linux-qcom/linux-qcom-next, the build fails for linux-yocto-dev because these kernels do not have the corresponding configurations enabled.The package manager cannot find the module packages that the ramdisk recipe expects. Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add driver configurations as modules (=m) to align with linux-qcom and linux-qcom-next kernel configurations. All kernel variants (linux-yocto,linux-yocto-dev,linux-qcom, linux-qcom-next) share the same ramdisk image. The UFS and SDHCI drivers have been converted to modules in linux-qcom/ linux-qcom-next to resolve SMMU probe ordering issues on QCS8300 platforms. When the storage driver modules are added to PACKAGE_INSTALL in the ramdisk recipe to support linux-qcom/linux-qcom-next, the build fails for linux-yocto-dev because these kernels do not have the corresponding configurations enabled.The package manager cannot find the module packages that the ramdisk recipe expects. Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Add kernel-module-ufs-qcom, kernel-module-ufshcd-core, kernel- module-ufshcd-pltfrm, kernel-module-sdhci-msm, kernel-module- governor-simpleondemand to ramdisk PACKAGE_INSTALL to support booting from UFS/SDHCI storage with modularized storage drivers. UFS-QCOM and SDHCI-MSM drivers have been converted from built-in (=y) to modules (=m) in kernel configs to resolve SMMU probe ordering race conditions on QCS8300 platforms where GPUCC, SMMU and storage drivers all initialize at the same device_initcall level. With storage drivers now built as modules instead of being compiled into the kernel, they must be present in the ramdisk to enable the system to access and mount the root filesystem from UFS or SDHCI storage devices during early boot. Without these modules in the ramdisk, the kernel cannot probe storage devices, resulting in boot failure with "unable to mount root fs" errors. These modules are loaded during ramdisk initialization, ensuring storage devices are available before attempting to mount the root filesystem, maintaining boot functionality while providing proper driver initialization sequencing. Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Updated the commit message |
lumag
left a comment
There was a problem hiding this comment.
On QCS8300 (Monaco), the kernel experiences a race condition
during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU
Clock Controller) all initialize at the same device_initcall
level (level 6). This creates a dependency chain where UFS
requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not
finish initialization before SMMU times out waiting for it.
Hmm, no. The GPU SMMU and the UFS SMMU are two different SMMU instances. So, the fact that 3da0000.iommu has not probed should not affect probing of 15000000.iommu and the UFS.
Please continue and find the actual root cause. What exactly is causing UFS to not to probe?
The race condition manifests as:
gcc-qcs8300 100000.clock-controller: sync_state() pending
due to 3d90000.clock-controller
arm-smmu 3da0000.iommu: deferred probe timeout, ignoring
dependency
arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with
error -110
Kernel panic in iommu_domain_free() at gmu_core_iommu_init()
The commit msg need to be re-written. The issue is not GPU SMMU blocking UFS , but actually few configs which are part of static image however their dependencies are configured as modules. This causes repeated probe deferrals sometime delaying ufs bring up and in some other cases exceeding probe deferral timeout that blocks other re-probes. Few cases from last debug that were identified are below:
I prefer such modules (which are dependencies for drivers that are part of static kernel image) be moved to initramfs so boot delays (and deferrable timeout) can be better managed.
|
Move UFS and SDHCI storage drivers from static kernel build to initramfs modules to improve boot initialization timing