Add PFS-MS v1.0 reference implementation with compression support#8
Merged
Conversation
Implement the PCF File System Multi-Session profile (specs/PFS-MS-spec-v1.0.txt) as a new crate at reference/PFS-MS-v1.0, built entirely on the pcf reference crate's public primitives. PFS-MS stores an append-only, multi-session tree of files and directories inside a single conforming PCF v1.0 file: file content in RAW partitions, node metadata in PFS_NODE partitions, and one PFS_SESSION per session, with backward-linked Table Blocks and a single in-place header-pointer rewrite at commit. The crate provides: - Node/Session record codecs with exact spec byte layouts (Sections 7, 8) - An append-only session writer following the S1..S7 commit protocol - A backward-chain reader with inter-session hash-chain verification, liveness, cycle/collision handling, and DIRECT/DELTA/EMPTY/INHERIT reconstruction - VCDIFF (RFC 3284) deltas via the pure-Rust oxidelta crate; UUIDv7 identities - A demo CLI (mkfs/mkdir/put/mv/rm/ls/cat/get/log/verify) - A byte-exact Section 17 reference vector plus roundtrip/coverage/spec tests - A dedicated CI workflow mirroring the PCF crate's gates PCF change (additive, backward-compatible): expose a read-only per-block walker (Container::read_block_at / BlockView) so the PFS reader can reuse PCF block iteration and access each block's table_hash. No on-disk layout or writer behavior changes; PCF retains 100% function / 95%+ line coverage.
Extend the (pre-publication) PFS-MS v1.0 format and reference implementation so
file content can be compressed, breaking compatibility with earlier drafts.
Compression is a PFS-level concern about file content only, so PCF is untouched.
Format changes (specs/PFS-MS-spec-v1.0.txt):
- New Compression Algorithm Registry (Section 9.5): 0 = none, 1 = DEFLATE
(RFC 1951, required), 2 = zstd / 3 = brotli reserved. An unknown id makes a
file unreadable but not the container malformed (same rule as patch_algo_id).
- DIRECT and DELTA content sections gain a compression_algo_id byte (DIRECT
90->91, DELTA 164->165); full_size/full_hash now describe the decompressed
content while the PCF data_hash protects the stored (compressed) bytes.
- Reconstruction (Section 9.3) decompresses RAW bytes before use. Updated
Appendices A/B, the intro, and the Section 17 narrative; profile stays 1.0.
Implementation (reference/PFS-MS-v1.0):
- New compress.rs (DEFLATE via pure-Rust flate2/miniz_oxide), mirroring delta.rs.
- ContentSection::{Direct,Delta} carry compression_algo; node codec updated.
- Writer DEFLATEs content/patches and stores the compressed form only when
smaller; FsWriter::set_compression and a `pfs put --store` flag disable it.
- Reader decompresses per compression_algo_id during materialize.
- Regenerated the byte-exact reference vector (now exercises a DEFLATE DIRECT)
and re-pinned its length/SHA-256; added compression round-trip, store-disabled,
and unknown-algo tests; updated layout-constant assertions and the README.
Add whole-directory tooling on top of the existing reader/writer, committing each create/update as a single session (one "burn") rather than one per file. Library: - FsWriter::commit_changes(&[Change]) applies a batch of Mkdir/PutFile/Remove in one session: resolves parents (including dirs created earlier in the same batch), reuses the DIRECT/DELTA + compression selection, skips no-op updates and existing dirs, enforces one record per node per session, and commits nothing when nothing changed. - New dirsync module: create_archive / update_archive / extract_archive plus a session_at_time helper. Recursively imports a host directory (skipping symlinks/special files), captures POSIX mode + mtime, mirrors deletions under --delete, and restores mode/mtime on extract (via the filetime crate). CLI (pfs): - create <archive> <dir> [--store] [--no-metadata] - update <archive> <dir> [--delete] [--store] [--no-metadata] - extract <archive> <dir> [--at <seq>] [--at-time <unix_ms>] [--no-metadata] (--no-metadata disables metadata capture on import and restore on extract independently; point-in-time selects historical state via tree_as_of). Tests/docs: new tests/dirsync.rs covering create->extract round-trip (incl. an empty dir and nested files), update add/modify, --delete mirror, point-in-time extract, unix mode preservation, and no-op-update-commits-no-session; README documents the new commands. Library coverage stays above the 90/90 floor; PCF is untouched.
Bring the PCF ports into parity with the Rust reference, which added an
additive, read-only per-block walker in this branch: Container.read_block_at
returning a BlockView { offset, header, entries }. It exposes one table block
at a time (including its table_hash and next_table_offset) so code layered on
PCF can group blocks and follow arbitrary chains, unlike entries() which
flattens the whole chain.
- C#: new BlockView.cs + Container.ReadBlockAt(ulong).
- PHP: new BlockView.php + Container::readBlockAt(int).
- TS: new BlockView interface + Container.readBlockAt(number), exported from
index.ts.
Each gets a test mirroring the reference's: force an overflow chain (first
block capacity 2, three partitions), walk it block-by-block via read_block_at,
and assert each block's offset, entry count, and that the stored table_hash
matches a recomputation. PHP (86) and TS (82) suites pass; the C# change
mirrors the existing Entries()/ReadBlock() patterns (no .NET toolchain in this
environment to run its suite).
PR #7 added pcf-debug with a PFS-MS decoder plugin, but its DIRECT/DELTA content-section parsing predates the compression extension. Bring it in sync with the revised spec (Section 7.3 / 9.5): - Decode the new compression_algo_id byte (0=none, 1=DEFLATE, 2=zstd, 3=brotli) in both DIRECT (now 91 bytes) and DELTA (now 165 bytes) content sections, and shift all subsequent field offsets accordingly. - Update the test fixture (pfs_node_direct) to the new 91-byte DIRECT layout and assert the decoded compression_algo_id field and its byte range. Also add reference/PFS-MS-v1.0 to the workspace members introduced by PR #7, so the crate builds within the workspace (it otherwise errors as a non-member). Verified end to end: a real DEFLATE-compressed archive built with `pfs put` now shows `compression_algo_id = 1 (DEFLATE)` under `pcf-debug decode`. Whole workspace passes fmt, clippy -D warnings, and all tests (pcf, pfs-ms, pcf-debug).
- reader.rs: use sort_by_key(Reverse(..)) for the descending history sort; clippy 1.96 on CI flags the sort_by/cmp form under unnecessary_sort_by (local clippy 1.94 did not). - ci-pfs.yml: the reference vector grew to 2986 bytes with the compression field (the pinned test was updated but this workflow size check was not).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces the complete PFS-MS v1.0 (PCF File System, Multi-Session Profile) reference implementation in Rust, along with updates to the specification and supporting infrastructure across multiple language implementations.
Key Changes
Core PFS-MS Implementation (
reference/PFS-MS-v1.0/)src/writer.rs): Append-only multi-session filesystem writer supporting declarative changes (mkdir, put_file, remove, mv), with session commitment and backward-linked table blockssrc/reader.rs,src/fs.rs): High-level filesystem reader with session chain verification, node view reconstruction, and history queriessrc/node.rs): Serialization/deserialization of file/directory metadata with support for three content kinds (Empty, Direct, Delta, Inherit)src/session.rs): Session metadata with inter-session hash chaining and member block digestssrc/tree.rs): Filesystem tree reconstruction with liveness checking, cycle detection, and path resolutionsrc/delta.rs)src/compress.rs): DEFLATE compression for file content with per-file compression_algo_id fieldsrc/dirsync.rs): Bidirectional archive ↔ filesystem tooling (create, update, extract)src/bin/pfs.rs): Complete command-line interface for filesystem operations and archive managementsrc/vector.rs): Canonical deterministic test vector for Section 17 scenarioSpecification Updates
specs/PFS-MS-spec-v1.0.txt): Added Section 9.5 documenting optional per-file content compression with compression_algo_id field in DIRECT and DELTA content sections (one byte longer than earlier drafts)Test Coverage
tests/spec_compliance.rs): Tests for all normative requirements (R1-R8, W1-W7) and Appendix A field layout constantstests/roundtrip.rs): End-to-end scenarios including multi-session history queriestests/dirsync.rs): Archive creation, updates, and extraction workflowstests/coverage.rs): Error paths, edge cases, and record parsing validationPCF Reference Updates
BlockViewstruct in PCF to support mid-commit state snapshots and block-level introspectionread_block_atfunctionality across Rust, TypeScript, PHP, and .NET implementationsBuild & CI
.github/workflows/ci-pfs.yml): Comprehensive testing with rustfmt, clippy, and test matrix across platformsNotable Implementation Details
https://claude.ai/code/session_01LuisvYRFWg6cWyf8LvBfxs