Skip to content

Add PerfectHash/PerfectHashWithOverflow support in UTOC writer#74

Open
rm-NoobInCoding wants to merge 4 commits into
cleanup-and-fixesfrom
perfect-hash-54
Open

Add PerfectHash/PerfectHashWithOverflow support in UTOC writer#74
rm-NoobInCoding wants to merge 4 commits into
cleanup-and-fixesfrom
perfect-hash-54

Conversation

@rm-NoobInCoding
Copy link
Copy Markdown
Owner

@rm-NoobInCoding rm-NoobInCoding commented Apr 18, 2026

Fixes #54.

Stacked on top of #73 — reviewing with base `cleanup-and-fixes` keeps the diff scoped to the perfect-hash work. Once #73 merges, GitHub will auto-retarget this PR to `master`.

Problem

UnrealReZen was emitting UTOC v3 with `TocChunkPerfectHashSeedsCount = 0`. FModel/ZenTools (and every other reader that took the v4+ path from Unreal) expect to resolve chunk IDs via the two-level perfect hash, so they reject the container as unreadable.

Fix

  • Bump `PackUtocVersion` 3 → 5 (`PerfectHashWithOverflow`).
  • `FIoChunkID.HashWithSeed`: FNV-1a over the 12-byte on-disk layout, byte-for-byte identical to CUE4Parse's `FIoChunkId.HashWithSeed`. Used both for seed-bucket selection and final slot placement.
  • CHD-style perfect-hash construction in `ConstructUtocFile`:
    • Bucket all chunks by `HashWithSeed(0) % N`, largest first
    • Singleton bucket → direct-slot seed `-(slot+1)`
    • Multi-chunk bucket → seed search (1..1M) for collision-free slot mapping
    • Unplaceable bucket → overflow sentinel `-(N+1)`; chunks go on `ChunkIndicesWithoutPerfectHash` (reader falls back to imperfect lookup on sentinels where `-seed-1 >= N`)
  • Chunk-indexed arrays (ChunkIds, OffLen, Metadata) + directory-index `UserData` are written in slot order. CompressionBlocks stay in UCAS/pack order since the reader derives `firstBlockIndex = Offset / CompressionBlockSize`.

Test plan

  • `dotnet build UnrealReZen/UnrealReZen.csproj --configuration Release` — 0 errors, no new warnings in UnrealReZen code
  • Pack a test content dir and verify FModel loads the resulting archive
  • Verify the game's loader still mounts the archive at runtime

FModel, ZenTools, and similar readers refuse archives whose UTOC
reports TocChunkPerfectHashSeedsCount == 0 on version >= 4, because
they expect to resolve chunks via the two-level perfect hash rather
than fall back to a linear ChunkId scan. UnrealReZen was still
emitting v3 headers with those fields zeroed, so the packed container
was unreadable in those tools.

Changes:
- Bump PackUtocVersion from 3 to 5 (PerfectHashWithOverflow).
- FIoChunkID.HashWithSeed: FNV-1a over the 12-byte on-disk layout
  (ID u64 LE, Index u16 LE, Padding, Type), byte-for-byte identical
  to CUE4Parse's FIoChunkId.HashWithSeed. This is what the reader
  uses for both seed-bucket selection and final slot placement.
- ConstructUtocFile: build a CHD-style perfect hash over all chunk
  IDs. Bucket by HashWithSeed(0) % N, process buckets largest-first:
  singletons get a direct-slot seed -(slot+1); larger buckets search
  seeds 1..1,000,000 for a collision-free slot mapping; unplaceable
  buckets use the overflow sentinel -(N+1) (reader treats any
  -seed-1 >= N as "fall back to imperfect lookup") and their chunks
  go on the ChunkIndicesWithoutPerfectHash list.
- Chunk-indexed arrays (ChunkIds, OffLen, Metadata) and the
  directory index's UserData are written in slot order so
  ChunkIds[slot] matches what HashWithSeed(seed) % N resolves to.
- CompressionBlocks stay in UCAS/pack order because the reader
  derives firstBlockIndex from Offset / CompressionBlockSize, so
  their index must align with their byte offset in the .ucas.
- Header fields TocChunkPerfectHashSeedsCount and
  TocChunksWithoutPerfectHashCount are populated accordingly.
rm-NoobInCoding and others added 3 commits April 19, 2026 23:02
When a content file exists in multiple source UTOCs (base + patch,
etc.), BuildUtocEntryLookup collects every match and BuildManifest
was adding all of them to the output, producing multiple packed
chunks with identical (ChunkId, ChunkType). The UTOC reader treats
those as equal (FIoChunkId.Equals ignores ChunkIndex), so the
imperfect-hash fallback dictionary overwrites earlier duplicates
and some slots resolve to the wrong OffLen.

With v3 UTOCs this was merely wasteful — Array.IndexOf always hit
the first duplicate and the directory UserData pointed there too,
so resolution was stable. v5 with the overflow fallback dict
exposes it as a correctness bug (FModel-produced offsets differ
from what the slot actually stores).

Dedupe by (ChunkId, ChunkType) at manifest-build time; seed the
set with the container-header chunk so a source entry can't
collide with it.
The previous WriteDependenciesAsUE5 layout didn't match the format
FIoContainerHeader reads for EIoContainerHeaderVersion.OptionalSegmentPackages,
so FModel/CUE4Parse crashed during mount while parsing the dep chunk:

  - Wrote PackageCount under a backwards condition (reader only reads
    it for version < OptionalSegmentPackages)
  - PackageIds had no length prefix, but ReadArray<FPackageId>() reads one
  - StoreEntry CArrayView offsets were 8 bytes with nonsensical base
    values; reader expects two 4-byte fields (arrayNum, offsetFromThis)
  - Missing OptionalSegmentPackageIds/StoreEntries, ContainerNameMap,
    LocalizedPackages, and PackageRedirects sections

Rebuild the store-entries section with a proper 24-byte header per
package plus an appended blob of imported-package IDs and SHA hashes,
and emit empty optional-segment / name-map / localized / redirect
sections so the reader finds what it expects all the way through.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Unreal's FIoContainerId::FromName hashes the lowercased container name
(UTF-16LE) with CityHash64. UnrealReZen was defaulting to a random u64,
which meant the TOC's stored ContainerId didn't match the ID the engine
computes from the file name at mount time — games that key their
container registry on the name-hash silently ignored the archive.

Use FPackageId.FromName (the same algorithm, already in CUE4Parse) on
Path.GetFileNameWithoutExtension(OutputPath). Verified against
pakchunk0-Windows.utoc: CityHash64("pakchunk0-windows") matches the
shipped ContainerId exactly. --container-id still overrides.

Drop the now-unused RandomUlong helper.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant