Skip to content

feat: kernel-dependent BTRFS UUID collision resolution (temp_fsuid on >=6.7)#674

Open
bfjelds wants to merge 28 commits into
user/bfjelds/mjolnir/acl-cosi-combinedfrom
user/bfjelds/mjolnir/acl-cosi-temp-fsuid
Open

feat: kernel-dependent BTRFS UUID collision resolution (temp_fsuid on >=6.7)#674
bfjelds wants to merge 28 commits into
user/bfjelds/mjolnir/acl-cosi-combinedfrom
user/bfjelds/mjolnir/acl-cosi-temp-fsuid

Conversation

@bfjelds

@bfjelds bfjelds commented Jun 4, 2026

Copy link
Copy Markdown
Member

Summary

Adds kernel-version-dependent mount strategy for ACL BTRFS UUID collisions during A/B updates:

  • Kernel >=6.7: Mount the staging device with -o temp_fsuid, which assigns a temporary in-memory UUID and bypasses the BTRFS global UUID registry. This is the preferred solution as it mounts real staging content without needing verity hash verification.
  • Kernel <6.7 (e.g. 6.6.x): Fall back to the existing bind-mount from active /usr, which requires verity hash matching to prove content is identical.

Note: The temp_fsuid codepath is aspirational. We believe it will work, but until trident A/B update and ACL run on a kernel >6.6, it is untested in production.

Changes

crates/osutils/src/uname.rs

  • Added KernelVersion struct with parse(), running(), and supports_btrfs_temp_fsuid()
  • 6 unit tests covering Azure Linux format, 6.7+, garbage input

crates/trident/src/engine/newroot.rs

  • Split monolithic detect_acl_btrfs_uuid_collision into three focused functions:
    • detect_acl_btrfs_uuid_collision - pure UUID collision detection
    • verify_acl_bind_mount_safety - verity hash check (bind-mount path only)
    • resolve_acl_btrfs_uuid_collision - orchestrator that picks strategy based on kernel version
  • New AclBtrfsCollisionResolution enum: TempFsuid vs BindMountActiveUsr
  • Unknown/unparseable kernel version falls back to bind-mount (safe default)

Testing

  • All 6 KernelVersion unit tests pass
  • All 7 ACL duplicate UUID validation tests pass
  • cargo build and cargo fmt --check clean on Linux

bfjelds and others added 28 commits June 1, 2026 14:17
ACL images ship with PARTUUID-based verity addons — templates for both
A and B slots stored in acl/uki-addons/ on the ESP, with slot A active
by default. During an A/B update, trident must swap the active addon
to match the target slot so the new UKI boots with the correct verity
partition identity.

Add activate_verity_addon_for_target_volume() which:
- Checks for ACL verity addon templates on the image ESP
- Copies the correct slot template into the staged addon directory
- Is a silent no-op for non-ACL images (no template dir)
- Errors if template dir exists but the selected slot is missing

Called from copy_file_artifacts() after stage_uki_on_esp(), gated on
ctx.image_distro().is_acl() to ensure only ACL images are affected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ACL uses identical FS UUIDs across A/B slots by design — partitions
are distinguished by PARTUUID instead. The within-image uniqueness
check is unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Scan each UKI's .extra.d/ directory for *.addon.efi files and extract
their .cmdline PE sections. Addons are stored as a new field on the
boot entry so the COSI metadata captures the full effective cmdline
(main UKI + addons).

Both Go (mkcosi) and Rust (metadata deserialization) updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With PARTUUID-based verity addons, usrhash= moved from the main UKI
cmdline to the verity addon cmdline. Update extractUsrhashFromUKIEntries
to also search addon cmdlines when looking for the root hash.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When staging an A/B update on ACL (Azure Container Linux) UKI images,
the COSI image may share BTRFS filesystem UUIDs with the active OS.
BTRFS maintains a kernel-global UUID registry and refuses to mount a
filesystem whose UUID is already registered by another mounted device,
causing the staging verity device mount to fail.

This change detects the UUID collision by checking the well-known ACL
USR-A/USR-B partition UUIDs (by PARTUUID) before the mount loop. When
a collision is detected, it bind-mounts the active /usr into the
newroot instead of attempting to mount the staging verity device. This
is safe because:

- USR is verity-protected and read-only
- Matching UUIDs means identical filesystem content
- The chroot only reads from /usr during provisioning
- After reboot, initramfs sets up the correct verity device normally

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When the bind-mount workaround activates for ACL BTRFS UUID collisions,
compare the staging USR verity root hash (from COSI metadata) against
the active USR root hash (from /proc/cmdline usrhash= parameter) to
cryptographically prove the filesystems are byte-identical.

If the staging hash is available but the active hash cannot be read or
does not match, the bind-mount is refused and the normal mount path
proceeds (which will fail with the BTRFS UUID error, as expected for
genuinely different content).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When internalParams.forceAbUpdate is true, trident will proceed with
an A/B update even when the old and new OS image SHA384 hashes match.
This is useful for testing A/B update flows repeatedly with the same
COSI file.

Usage in trident-config.yaml:
  internalParams:
    forceAbUpdate: true

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the blanket ACL skip in validate_filesystem_uniqueness() with
proper validation. When a duplicate FS UUID is found during A/B update
on ACL, the update is only allowed if:

1. The duplicate is on the /usr mount point
2. The staging COSI has a verity root hash
3. The active system has a usrhash= in /proc/cmdline
4. The normalized hashes match (merkle tree proof of identical content)

If COSI partition metadata is available, also validates that the staging
USR partition has a known ACL PARTUUID.

Extracts ACL constants and read_active_usr_roothash() into a shared
engine::acl module used by both osimage.rs and newroot.rs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DiscoverablePartitionType does not have is_acl_usr() — that method
lives on the HC PartitionType enum. Since we already check for known
ACL USR PARTUUIDs, the part_type check was redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ESP (128 MB) can overflow when multiple UKIs accumulate across A/B
updates. Before staging a new UKI, remove old UKIs for the target slot:

1. Trident-managed UKIs matching the target slot (all install indices)
2. Non-trident-managed (original install) UKIs, but only when trident
   already manages the other slot (proving it owns boot management)

The other slot's UKI is always preserved as the active/rollback path.

Also extract UKI_SLOT_A/UKI_SLOT_B constants to replace string literals.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In multi-OS configurations, the ESP has UKI pairs per OS instance
(azla0/azlb0, azla1/azlb1, etc.). Cleanup must only remove UKIs for
the specific slot+os-index being updated, not all UKIs for the same
slot letter across different OS instances.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In multiboot configurations, the original UKI has OS 0's partition
references baked in. OS 1+ instances never depend on it, but only
OS 0 should remove it since it's the owner.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move /proc/cmdline read out of validate_acl_duplicate_uuid into its
caller (validate_filesystem_uniqueness). The function now accepts
active_usr_roothash as Option<String>, making it fully testable in
unit tests without filesystem access.

Add 7 unit tests covering all validation paths:
- matching hash (success)
- case-insensitive matching (success)
- wrong mount point (reject)
- no staging verity hash (reject)
- mismatched hashes (reject)
- no active hash / None (reject)
- empty active hash (reject)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DR-001 (High): Replace if-let with let-else for missing staging hash in
detect_acl_btrfs_uuid_collision - None now logs a warning and refuses
the bind-mount instead of silently proceeding unverified.

DR-002 (High): Replace suffix.contains() with exact suffix equality in
cleanup_ukis_before_staging - prevents azla0 from matching azla01.efi
in multiboot with 10+ OS instances.

DR-003 (Medium): Extract verity_hashes_match() into engine::acl module,
replacing duplicated normalize+compare logic in newroot.rs and osimage.rs.
Rejects empty hashes so "" == "" cannot incorrectly pass.

DR-004 (Medium): Document pre-staging cleanup ordering rationale in
esp.rs - explains the crash-safety trade-off (active slot UKI preserved
as A/B fallback).

DR-005 (Medium): Make remove_uki_and_addons idempotent by treating
NotFound as success - prevents orphaned addon dirs if UKI was already
removed by a prior partial cleanup.

DR-006 (Medium): Document that cleanup_ukis_before_staging is
intentionally universal (not ACL-gated) - ESP space constraints apply
to all UKI-based A/B updates.

DR-007 (Medium): Replace byte-index hash slicing with char-safe
hash_preview() using chars().take(16) - prevents panics on non-ASCII
input (defense in depth for hex hashes).

Adds unit tests for verity_hashes_match(), hash_preview(),
cleanup_ukis_before_staging (exact suffix matching, multi-index cleanup),
and remove_uki_and_addons (idempotency, addon directory cleanup).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On kernel >=6.7, use mount -o temp_fsuid to mount the staging device
directly, bypassing the BTRFS global UUID registry. This is the
preferred solution as it mounts real staging content without needing
verity hash verification.

On kernel <6.7 (e.g. 6.6.x), fall back to the existing bind-mount
strategy which requires verity hash matching to prove the active and
staging content are identical.

Changes:
- Add KernelVersion parser to osutils/uname.rs with unit tests
- Split detect_acl_btrfs_uuid_collision into collision detection and
  resolution strategy (AclBtrfsCollisionResolution enum)
- Add verify_acl_bind_mount_safety for the bind-mount path
- Mount handler selects strategy based on kernel version

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The temp_fsuid mount path (kernel >=6.7) is aspirational and untested in
production. Gate it behind the enableAzl4 internal parameter so it only
activates when explicitly opted in. When the flag is absent, the
bind-mount fallback is used. No special warning or fallback from
temp_fsuid failure — mount errors propagate as-is to surface issues.

The enableAzl4 flag is intentionally broad: it will gate additional
Azure Linux 4 behaviors as they are added.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DR-002: Move BTRFS temp_fsuid domain knowledge out of osutils. Remove
supports_btrfs_temp_fsuid() from KernelVersion (generic layer) and
define BTRFS_TEMP_FSUID_MIN_KERNEL constant in the consumer (newroot.rs).
KernelVersion now relies on derived Ord for version comparisons.

DR-003: Distinguish uname execution failure from parse failure. The
match on KernelVersion::running() now logs different warnings for Err
(uname command failed) vs Ok(None) (output not parseable).

DR-004: Add doc comment explaining why verity hash verification is
intentionally skipped for the temp_fsuid path (it mounts real staging
content, not a bind-mount of active, so no identity assumption to verify).

DR-005: Eliminate double pattern match on AclBtrfsCollisionResolution in
the mount loop. Add collision_uuid() accessor method so the UUID is
extracted once, then dispatch on the resolution variant in a single match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bfjelds bfjelds force-pushed the user/bfjelds/mjolnir/acl-cosi-temp-fsuid branch from eed9f51 to e4eaf3f Compare June 5, 2026 19:36
@bfjelds

bfjelds commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

/azp run [GITHUB]-trident-pr

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@bfjelds bfjelds marked this pull request as ready for review June 8, 2026 18:16
@bfjelds bfjelds requested a review from a team as a code owner June 8, 2026 18:16
@bfjelds bfjelds force-pushed the user/bfjelds/mjolnir/acl-cosi-combined branch from b5f1ff1 to 1522587 Compare June 11, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant