Skip to content

Add PFS-MS v1.0 reference implementation with compression support#8

Merged
kduma merged 8 commits into
masterfrom
claude/cool-dijkstra-1cU6x
Jun 2, 2026
Merged

Add PFS-MS v1.0 reference implementation with compression support#8
kduma merged 8 commits into
masterfrom
claude/cool-dijkstra-1cU6x

Conversation

@kduma
Copy link
Copy Markdown
Contributor

@kduma kduma commented Jun 2, 2026

Summary

This PR introduces the complete PFS-MS v1.0 (PCF File System, Multi-Session Profile) reference implementation in Rust, along with updates to the specification and supporting infrastructure across multiple language implementations.

Key Changes

Core PFS-MS Implementation (reference/PFS-MS-v1.0/)

  • Writer (src/writer.rs): Append-only multi-session filesystem writer supporting declarative changes (mkdir, put_file, remove, mv), with session commitment and backward-linked table blocks
  • Reader (src/reader.rs, src/fs.rs): High-level filesystem reader with session chain verification, node view reconstruction, and history queries
  • Node Records (src/node.rs): Serialization/deserialization of file/directory metadata with support for three content kinds (Empty, Direct, Delta, Inherit)
  • Session Records (src/session.rs): Session metadata with inter-session hash chaining and member block digests
  • Tree Semantics (src/tree.rs): Filesystem tree reconstruction with liveness checking, cycle detection, and path resolution
  • Content Handling:
    • Delta encoding via VCDIFF (src/delta.rs)
    • Compression support (src/compress.rs): DEFLATE compression for file content with per-file compression_algo_id field
  • Directory Sync (src/dirsync.rs): Bidirectional archive ↔ filesystem tooling (create, update, extract)
  • CLI Tool (src/bin/pfs.rs): Complete command-line interface for filesystem operations and archive management
  • Reference Vector (src/vector.rs): Canonical deterministic test vector for Section 17 scenario

Specification Updates

  • PFS-MS Spec (specs/PFS-MS-spec-v1.0.txt): Added Section 9.5 documenting optional per-file content compression with compression_algo_id field in DIRECT and DELTA content sections (one byte longer than earlier drafts)

Test Coverage

  • Spec Compliance (tests/spec_compliance.rs): Tests for all normative requirements (R1-R8, W1-W7) and Appendix A field layout constants
  • Roundtrip Tests (tests/roundtrip.rs): End-to-end scenarios including multi-session history queries
  • Directory Sync Tests (tests/dirsync.rs): Archive creation, updates, and extraction workflows
  • Coverage Tests (tests/coverage.rs): Error paths, edge cases, and record parsing validation

PCF Reference Updates

  • BlockView Type: Exposed BlockView struct in PCF to support mid-commit state snapshots and block-level introspection
  • compute_table_hash Export: Made hash computation function public for PFS-MS writer use
  • Test Updates: Added roundtrip tests for read_block_at functionality across Rust, TypeScript, PHP, and .NET implementations

Build & CI

  • Workspace Integration: Added PFS-MS crate to workspace members
  • CI Pipeline (.github/workflows/ci-pfs.yml): Comprehensive testing with rustfmt, clippy, and test matrix across platforms
  • Dependencies: Added flate2 (DEFLATE), oxidelta (VCDIFF), uuid (UUIDv7), and filetime for metadata handling

Notable Implementation Details

  • Compression is optional: Files are compressed only if the compressed form is smaller; compression_algo_id = 0 means uncompressed
  • Backward compatibility: Compression field addition is intentionally incompatible with earlier drafts; profile version remains 1.0 as it is unpublished
  • Pure serialization: Writer uses PCF's serialization primitives directly rather than the in-place Container writer, enabling backward-linked blocks and atomic header rewrites
  • Auditability over performance: Implementation prioritizes clarity and spec compliance for reference purposes
  • Cross-language support: Updates to .NET, PHP, and TypeScript implementations to expose BlockView and support PFS-MS reading

https://claude.ai/code/session_01LuisvYRFWg6cWyf8LvBfxs

claude added 8 commits June 2, 2026 11:12
Implement the PCF File System Multi-Session profile (specs/PFS-MS-spec-v1.0.txt)
as a new crate at reference/PFS-MS-v1.0, built entirely on the pcf reference
crate's public primitives.

PFS-MS stores an append-only, multi-session tree of files and directories inside
a single conforming PCF v1.0 file: file content in RAW partitions, node metadata
in PFS_NODE partitions, and one PFS_SESSION per session, with backward-linked
Table Blocks and a single in-place header-pointer rewrite at commit.

The crate provides:
- Node/Session record codecs with exact spec byte layouts (Sections 7, 8)
- An append-only session writer following the S1..S7 commit protocol
- A backward-chain reader with inter-session hash-chain verification, liveness,
  cycle/collision handling, and DIRECT/DELTA/EMPTY/INHERIT reconstruction
- VCDIFF (RFC 3284) deltas via the pure-Rust oxidelta crate; UUIDv7 identities
- A demo CLI (mkfs/mkdir/put/mv/rm/ls/cat/get/log/verify)
- A byte-exact Section 17 reference vector plus roundtrip/coverage/spec tests
- A dedicated CI workflow mirroring the PCF crate's gates

PCF change (additive, backward-compatible): expose a read-only per-block walker
(Container::read_block_at / BlockView) so the PFS reader can reuse PCF block
iteration and access each block's table_hash. No on-disk layout or writer
behavior changes; PCF retains 100% function / 95%+ line coverage.
Extend the (pre-publication) PFS-MS v1.0 format and reference implementation so
file content can be compressed, breaking compatibility with earlier drafts.
Compression is a PFS-level concern about file content only, so PCF is untouched.

Format changes (specs/PFS-MS-spec-v1.0.txt):
- New Compression Algorithm Registry (Section 9.5): 0 = none, 1 = DEFLATE
  (RFC 1951, required), 2 = zstd / 3 = brotli reserved. An unknown id makes a
  file unreadable but not the container malformed (same rule as patch_algo_id).
- DIRECT and DELTA content sections gain a compression_algo_id byte (DIRECT
  90->91, DELTA 164->165); full_size/full_hash now describe the decompressed
  content while the PCF data_hash protects the stored (compressed) bytes.
- Reconstruction (Section 9.3) decompresses RAW bytes before use. Updated
  Appendices A/B, the intro, and the Section 17 narrative; profile stays 1.0.

Implementation (reference/PFS-MS-v1.0):
- New compress.rs (DEFLATE via pure-Rust flate2/miniz_oxide), mirroring delta.rs.
- ContentSection::{Direct,Delta} carry compression_algo; node codec updated.
- Writer DEFLATEs content/patches and stores the compressed form only when
  smaller; FsWriter::set_compression and a `pfs put --store` flag disable it.
- Reader decompresses per compression_algo_id during materialize.
- Regenerated the byte-exact reference vector (now exercises a DEFLATE DIRECT)
  and re-pinned its length/SHA-256; added compression round-trip, store-disabled,
  and unknown-algo tests; updated layout-constant assertions and the README.
Add whole-directory tooling on top of the existing reader/writer, committing
each create/update as a single session (one "burn") rather than one per file.

Library:
- FsWriter::commit_changes(&[Change]) applies a batch of Mkdir/PutFile/Remove
  in one session: resolves parents (including dirs created earlier in the same
  batch), reuses the DIRECT/DELTA + compression selection, skips no-op updates
  and existing dirs, enforces one record per node per session, and commits
  nothing when nothing changed.
- New dirsync module: create_archive / update_archive / extract_archive plus a
  session_at_time helper. Recursively imports a host directory (skipping
  symlinks/special files), captures POSIX mode + mtime, mirrors deletions under
  --delete, and restores mode/mtime on extract (via the filetime crate).

CLI (pfs):
- create  <archive> <dir> [--store] [--no-metadata]
- update  <archive> <dir> [--delete] [--store] [--no-metadata]
- extract <archive> <dir> [--at <seq>] [--at-time <unix_ms>] [--no-metadata]
  (--no-metadata disables metadata capture on import and restore on extract
  independently; point-in-time selects historical state via tree_as_of).

Tests/docs: new tests/dirsync.rs covering create->extract round-trip (incl. an
empty dir and nested files), update add/modify, --delete mirror, point-in-time
extract, unix mode preservation, and no-op-update-commits-no-session; README
documents the new commands. Library coverage stays above the 90/90 floor; PCF
is untouched.
Bring the PCF ports into parity with the Rust reference, which added an
additive, read-only per-block walker in this branch: Container.read_block_at
returning a BlockView { offset, header, entries }. It exposes one table block
at a time (including its table_hash and next_table_offset) so code layered on
PCF can group blocks and follow arbitrary chains, unlike entries() which
flattens the whole chain.

- C#:  new BlockView.cs + Container.ReadBlockAt(ulong).
- PHP: new BlockView.php + Container::readBlockAt(int).
- TS:  new BlockView interface + Container.readBlockAt(number), exported from
       index.ts.

Each gets a test mirroring the reference's: force an overflow chain (first
block capacity 2, three partitions), walk it block-by-block via read_block_at,
and assert each block's offset, entry count, and that the stored table_hash
matches a recomputation. PHP (86) and TS (82) suites pass; the C# change
mirrors the existing Entries()/ReadBlock() patterns (no .NET toolchain in this
environment to run its suite).
PR #7 added pcf-debug with a PFS-MS decoder plugin, but its DIRECT/DELTA
content-section parsing predates the compression extension. Bring it in sync
with the revised spec (Section 7.3 / 9.5):

- Decode the new compression_algo_id byte (0=none, 1=DEFLATE, 2=zstd,
  3=brotli) in both DIRECT (now 91 bytes) and DELTA (now 165 bytes) content
  sections, and shift all subsequent field offsets accordingly.
- Update the test fixture (pfs_node_direct) to the new 91-byte DIRECT layout
  and assert the decoded compression_algo_id field and its byte range.

Also add reference/PFS-MS-v1.0 to the workspace members introduced by PR #7,
so the crate builds within the workspace (it otherwise errors as a non-member).

Verified end to end: a real DEFLATE-compressed archive built with `pfs put`
now shows `compression_algo_id = 1 (DEFLATE)` under `pcf-debug decode`. Whole
workspace passes fmt, clippy -D warnings, and all tests (pcf, pfs-ms, pcf-debug).
- reader.rs: use sort_by_key(Reverse(..)) for the descending history sort;
  clippy 1.96 on CI flags the sort_by/cmp form under unnecessary_sort_by
  (local clippy 1.94 did not).
- ci-pfs.yml: the reference vector grew to 2986 bytes with the compression
  field (the pinned test was updated but this workflow size check was not).
@kduma kduma merged commit e7df035 into master Jun 2, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants