Skip to content

Manually construct flake archive, update tar#317

Open
cole-h wants to merge 1 commit into
mainfrom
manually-construct-archive
Open

Manually construct flake archive, update tar#317
cole-h wants to merge 1 commit into
mainfrom
manually-construct-archive

Conversation

@cole-h
Copy link
Copy Markdown
Member

@cole-h cole-h commented May 26, 2026

Summary by CodeRabbit

  • Chores
    • Updated dependencies to recent stable releases and added improved filesystem traversal support.
  • Bug Fixes
    • Archive creation now streams compressed output directly, produces deterministic metadata, preserves symlinks, and skips unsupported special file types for more reliable, reproducible archives.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

Walkthrough

Tarball creation now uses crates.io tar and walkdir to stream deterministic tar entries directly into a gzip encoder while walking the source directory, removing intermediate in-memory tar buffering.

Changes

Streaming Tarball Creation

Layer / File(s) Summary
Dependency updates
Cargo.toml
tar dependency migrates from a git branch override to crates.io release 0.4.46 with xattr feature. walkdir 2.5.0 is added for filesystem traversal during tarball generation.
Streaming tarball implementation
src/flake_info.rs
FlakeMetadata::flake_tarball refactored to stream tar entries directly into a gzip encoder via tar::Builder while walking the source directory. Deterministic headers are computed per entry with mtime from lastModified and fixed uid/gid of 0. Symlink following disabled; directories, regular files, and symlinks are appended without intermediate in-memory buffering. std::io::Write import removed.
sequenceDiagram
  participant FlakeMetadata
  participant BufWriter
  participant GzEncoder
  participant TarBuilder
  participant Walkdir
  participant Filesystem
  FlakeMetadata->>BufWriter: create wrapper
  BufWriter->>GzEncoder: create encoder
  GzEncoder->>TarBuilder: create builder
  FlakeMetadata->>Walkdir: traverse source_dir
  Walkdir->>Filesystem: walk entries
  Walkdir-->>FlakeMetadata: each entry
  FlakeMetadata->>TarBuilder: compute deterministic header (mtime, uid, gid)
  TarBuilder->>GzEncoder: append file/dir/symlink
  FlakeMetadata->>TarBuilder: finish builder
  TarBuilder->>GzEncoder: flush tar data
  GzEncoder->>BufWriter: compress to bytes
  BufWriter-->>FlakeMetadata: tarball complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through files with a pleasant stride,
Replaced the old heap where tar was tied,
Walkdir led the way, step by step,
Streams of gzip now neatly kept,
Our flake rolls out tidy and wide! 📦

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Manually construct flake archive, update tar' directly reflects the main changes: refactoring the flake tarball generation to manually construct the archive using a streaming pipeline and updating the tar dependency from a Git branch override to crates.io release 0.4.46.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch manually-construct-archive

Comment @coderabbitai help to get the list of available commands and usage tips.

@cole-h cole-h force-pushed the manually-construct-archive branch from be0f438 to c95f6b0 Compare May 26, 2026 20:30
@cole-h cole-h changed the title WIP: replace force-mtime branch with manually constructing the archive Manually construct flake archive, update tar May 26, 2026
@cole-h cole-h marked this pull request as ready for review May 26, 2026 20:30
@cole-h cole-h requested review from colemickens and grahamc May 26, 2026 20:30
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/flake_info.rs (1)

227-227: 💤 Low value

Redundant set_cksum() call.

The append_data and append_link methods internally recompute and set the header checksum before writing, making this explicit call unnecessary.

🧹 Remove redundant call
             header.set_uid(0);
             header.set_gid(0);
-            header.set_cksum();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flake_info.rs` at line 227, Remove the redundant header.set_cksum() call
at the indicated spot: append_data() and append_link() already recompute and set
the header checksum internally, so delete the explicit header.set_cksum()
invocation to avoid duplicate work and potential confusion; locate the call to
header.set_cksum() near the surrounding append_data / append_link usage and
remove that single line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/flake_info.rs`:
- Around line 245-249: The call to GzEncoder::finish() currently uses unwrap()
which can panic on I/O errors; replace the unwrap with proper error
propagation—invoke finish() and propagate its Result using ? while adding
context (e.g., change .finish().unwrap() to .finish().wrap_err("Finishing
gzip")? ), so the failing finalization is returned as an error instead of
panicking (refer to the compressed_tarball assignment and the
tarball.into_inner().wrap_err("Creating gzip")? chain).

---

Nitpick comments:
In `@src/flake_info.rs`:
- Line 227: Remove the redundant header.set_cksum() call at the indicated spot:
append_data() and append_link() already recompute and set the header checksum
internally, so delete the explicit header.set_cksum() invocation to avoid
duplicate work and potential confusion; locate the call to header.set_cksum()
near the surrounding append_data / append_link usage and remove that single
line.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e16cacf3-1d48-49c6-9757-5a1809ba393f

📥 Commits

Reviewing files that changed from the base of the PR and between bfc2781 and c95f6b0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • Cargo.toml
  • src/flake_info.rs

Comment thread src/flake_info.rs Outdated
@cole-h cole-h force-pushed the manually-construct-archive branch from c95f6b0 to abfae1d Compare May 26, 2026 20:39
@cole-h
Copy link
Copy Markdown
Member Author

cole-h commented May 26, 2026

$ tar -tvf old.tar.gz
dr-xr-xr-x 0/0               0 2026-04-15 10:08 834g92m2pmxdn14cvvpdv77fjjrswyla-source/
-r--r--r-- 0/0            5060 2026-04-15 10:08 834g92m2pmxdn14cvvpdv77fjjrswyla-source/flake.nix
dr-xr-xr-x 0/0               0 2026-04-15 10:08 834g92m2pmxdn14cvvpdv77fjjrswyla-source/.github
dr-xr-xr-x 0/0               0 2026-04-15 10:08 834g92m2pmxdn14cvvpdv77fjjrswyla-source/.github/workflows
-r--r--r-- 0/0             736 2026-04-15 10:08 834g92m2pmxdn14cvvpdv77fjjrswyla-source/.github/workflows/publish.yml

$ tar -tvf new.tar.gz
drwxr-xr-x 0/0               0 2026-05-26 12:56 c6i6nfcm7fkisih3wp9b0zk5zl22msfg-source
drwxr-xr-x 0/0               0 2026-05-26 12:56 c6i6nfcm7fkisih3wp9b0zk5zl22msfg-source/.github
drwxr-xr-x 0/0               0 2026-05-26 12:56 c6i6nfcm7fkisih3wp9b0zk5zl22msfg-source/.github/workflows
-rw-r--r-- 0/0             698 2026-05-26 12:56 c6i6nfcm7fkisih3wp9b0zk5zl22msfg-source/.github/workflows/publish.yml
-rw-r--r-- 0/0            5060 2026-05-26 12:56 c6i6nfcm7fkisih3wp9b0zk5zl22msfg-source/flake.nix

Technically, this results in uploading tarballs with "wider" permissions than we used to. However, when extracted to the Nix store, Nix still enforces its permissions:

$ exa /nix/store/c6i6nfcm7fkisih3wp9b0zk5zl22msfg-source -Tal
dr-xr-xr-x    - root 31 Dec  1969 /nix/store/c6i6nfcm7fkisih3wp9b0zk5zl22msfg-source
dr-xr-xr-x    - root 31 Dec  1969 ├── .github
dr-xr-xr-x    - root 31 Dec  1969 │   └── workflows
.r--r--r--  698 root 31 Dec  1969 │       └── publish.yml
.r--r--r-- 5.1k root 31 Dec  1969 └── flake.nix

This is probably fine?

@cole-h cole-h force-pushed the manually-construct-archive branch from abfae1d to f94590a Compare May 26, 2026 20:41
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/flake_info.rs (1)

215-241: 💤 Low value

The set_cksum() call is redundant.

The append_data and append_link methods internally recalculate and set the checksum before writing the header. The manual set_cksum() on line 227 is harmless but unnecessary.

♻️ Optional: Remove redundant checksum call
             header.set_mtime(last_modified);
             header.set_uid(0);
             header.set_gid(0);
-            header.set_cksum();

             if metadata.is_dir() {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flake_info.rs` around lines 215 - 241, Remove the redundant manual
checksum call: the tar::Header named header is having set_cksum() invoked before
calls to tarball_builder.append_data and append_link, but those methods
recompute the checksum themselves; delete the header.set_cksum() invocation so
the header is prepared via header.set_metadata_in_mode / set_mtime / set_uid /
set_gid only and let append_data/append_link handle checksum calculation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/flake_info.rs`:
- Around line 215-241: Remove the redundant manual checksum call: the
tar::Header named header is having set_cksum() invoked before calls to
tarball_builder.append_data and append_link, but those methods recompute the
checksum themselves; delete the header.set_cksum() invocation so the header is
prepared via header.set_metadata_in_mode / set_mtime / set_uid / set_gid only
and let append_data/append_link handle checksum calculation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7ba4056b-c9fb-47f9-8229-0c1f27499f37

📥 Commits

Reviewing files that changed from the base of the PR and between c95f6b0 and abfae1d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • Cargo.toml
  • src/flake_info.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/flake_info.rs`:
- Around line 223-226: The tar header currently uses
header.set_metadata_in_mode(&metadata, tar::HeaderMode::Deterministic) which
normalizes mode bits; change it to use tar::HeaderMode::Complete so the mode
from metadata is preserved while still explicitly setting mtime/uid/gid via
header.set_mtime(last_modified), header.set_uid(0), and header.set_gid(0);
update the call to set_metadata_in_mode accordingly to retain original
permission bits from metadata.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4d598e5-ad69-410c-b5ca-fd504f7105cf

📥 Commits

Reviewing files that changed from the base of the PR and between abfae1d and f94590a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • Cargo.toml
  • src/flake_info.rs

Comment thread src/flake_info.rs
Comment on lines +223 to +226
header.set_metadata_in_mode(&metadata, tar::HeaderMode::Deterministic);
header.set_mtime(last_modified);
header.set_uid(0);
header.set_gid(0);
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

tar crate 0.4.46 HeaderMode Deterministic Complete set_metadata_in_mode preserves mode bits 0o755 0o644

💡 Result:

In the tar crate (version 0.4.46), the HeaderMode::Deterministic variant is designed to produce reproducible archive headers by excluding metadata that changes between builds, such as specific ownership (UID/GID) and timestamps [1][2]. When set_metadata_in_mode is called with HeaderMode::Deterministic, the crate handles file mode bits explicitly to maintain a standard, deterministic state [2]: 1. Ownership: UID and GID are set to 0 [2]. 2. Timestamp: The mtime is set to a constant DETERMINISTIC_TIMESTAMP to avoid issues with zero-value timestamps in some tools [3][2]. 3. Mode Bits: The crate does not simply copy the raw mode bits from the source filesystem [2]. Instead, it applies logic to normalize them [2]: - It checks if the file is a directory or if the user-execute bit (0o100) is set on the original file [2]. - If either condition is true, it sets the mode to 0o755 [2]. - Otherwise, it sets the mode to 0o644 [2]. This mechanism ensures that the resulting tar headers have predictable, consistent permissions (0o755 or 0o644) regardless of the specific umask or file permissions present on the system where the archive is being created [2]. In contrast, HeaderMode::Complete preserves the original metadata, including full ownership and mode bits, from the filesystem [2].

Citations:


Preserve original permission bits in tar headers

tar::HeaderMode::Deterministic normalizes archived mode bits (directories/executables -> 0o755, others -> 0o644) instead of preserving the filesystem/store modes, which can widen permissions versus what you started with. Since Lines 224-226 already overwrite mtime/uid/gid, switch to HeaderMode::Complete to preserve the metadata-provided mode bits while keeping those fields controlled.

Suggested change
-            header.set_metadata_in_mode(&metadata, tar::HeaderMode::Deterministic);
+            header.set_metadata_in_mode(&metadata, tar::HeaderMode::Complete);
             header.set_mtime(last_modified);
             header.set_uid(0);
             header.set_gid(0);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
header.set_metadata_in_mode(&metadata, tar::HeaderMode::Deterministic);
header.set_mtime(last_modified);
header.set_uid(0);
header.set_gid(0);
header.set_metadata_in_mode(&metadata, tar::HeaderMode::Complete);
header.set_mtime(last_modified);
header.set_uid(0);
header.set_gid(0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/flake_info.rs` around lines 223 - 226, The tar header currently uses
header.set_metadata_in_mode(&metadata, tar::HeaderMode::Deterministic) which
normalizes mode bits; change it to use tar::HeaderMode::Complete so the mode
from metadata is preserved while still explicitly setting mtime/uid/gid via
header.set_mtime(last_modified), header.set_uid(0), and header.set_gid(0);
update the call to set_metadata_in_mode accordingly to retain original
permission bits from metadata.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. Honestly, I'd rather it do the equivalent of chmod ugo-w on everything but Nix already does that for us when it extracts the flake to the store so...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cole-h, understood — since Nix enforces read-only store permissions at extraction time regardless of what's in the tarball, the wider 0o755/0o644 modes from HeaderMode::Deterministic are harmless in practice. Makes sense to leave it as-is.

(ᵔᴥᵔ)


✏️ Learnings added
Learnt from: cole-h
Repo: DeterminateSystems/flakehub-push PR: 317
File: src/flake_info.rs:223-226
Timestamp: 2026-05-26T20:56:40.994Z
Learning: In `src/flake_info.rs` (Rust), `FlakeMetadata::flake_tarball` intentionally uses `tar::HeaderMode::Deterministic` when building the flake tarball. This normalizes mode bits to `0o755` for directories/executables and `0o644` for other files, which is "wider" than Nix store permissions. This is acceptable because Nix enforces the correct read-only permissions (dr-xr-xr-x / -r--r--r--) when it extracts the tarball into the Nix store, so the tarball's wider permissions are harmless. Do not suggest switching to `HeaderMode::Complete` to preserve original mode bits.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant