Add PerfectHash/PerfectHashWithOverflow support in UTOC writer#74
Open
rm-NoobInCoding wants to merge 4 commits into
Open
Add PerfectHash/PerfectHashWithOverflow support in UTOC writer#74rm-NoobInCoding wants to merge 4 commits into
rm-NoobInCoding wants to merge 4 commits into
Conversation
FModel, ZenTools, and similar readers refuse archives whose UTOC reports TocChunkPerfectHashSeedsCount == 0 on version >= 4, because they expect to resolve chunks via the two-level perfect hash rather than fall back to a linear ChunkId scan. UnrealReZen was still emitting v3 headers with those fields zeroed, so the packed container was unreadable in those tools. Changes: - Bump PackUtocVersion from 3 to 5 (PerfectHashWithOverflow). - FIoChunkID.HashWithSeed: FNV-1a over the 12-byte on-disk layout (ID u64 LE, Index u16 LE, Padding, Type), byte-for-byte identical to CUE4Parse's FIoChunkId.HashWithSeed. This is what the reader uses for both seed-bucket selection and final slot placement. - ConstructUtocFile: build a CHD-style perfect hash over all chunk IDs. Bucket by HashWithSeed(0) % N, process buckets largest-first: singletons get a direct-slot seed -(slot+1); larger buckets search seeds 1..1,000,000 for a collision-free slot mapping; unplaceable buckets use the overflow sentinel -(N+1) (reader treats any -seed-1 >= N as "fall back to imperfect lookup") and their chunks go on the ChunkIndicesWithoutPerfectHash list. - Chunk-indexed arrays (ChunkIds, OffLen, Metadata) and the directory index's UserData are written in slot order so ChunkIds[slot] matches what HashWithSeed(seed) % N resolves to. - CompressionBlocks stay in UCAS/pack order because the reader derives firstBlockIndex from Offset / CompressionBlockSize, so their index must align with their byte offset in the .ucas. - Header fields TocChunkPerfectHashSeedsCount and TocChunksWithoutPerfectHashCount are populated accordingly.
29a810a to
2822512
Compare
When a content file exists in multiple source UTOCs (base + patch, etc.), BuildUtocEntryLookup collects every match and BuildManifest was adding all of them to the output, producing multiple packed chunks with identical (ChunkId, ChunkType). The UTOC reader treats those as equal (FIoChunkId.Equals ignores ChunkIndex), so the imperfect-hash fallback dictionary overwrites earlier duplicates and some slots resolve to the wrong OffLen. With v3 UTOCs this was merely wasteful — Array.IndexOf always hit the first duplicate and the directory UserData pointed there too, so resolution was stable. v5 with the overflow fallback dict exposes it as a correctness bug (FModel-produced offsets differ from what the slot actually stores). Dedupe by (ChunkId, ChunkType) at manifest-build time; seed the set with the container-header chunk so a source entry can't collide with it.
The previous WriteDependenciesAsUE5 layout didn't match the format
FIoContainerHeader reads for EIoContainerHeaderVersion.OptionalSegmentPackages,
so FModel/CUE4Parse crashed during mount while parsing the dep chunk:
- Wrote PackageCount under a backwards condition (reader only reads
it for version < OptionalSegmentPackages)
- PackageIds had no length prefix, but ReadArray<FPackageId>() reads one
- StoreEntry CArrayView offsets were 8 bytes with nonsensical base
values; reader expects two 4-byte fields (arrayNum, offsetFromThis)
- Missing OptionalSegmentPackageIds/StoreEntries, ContainerNameMap,
LocalizedPackages, and PackageRedirects sections
Rebuild the store-entries section with a proper 24-byte header per
package plus an appended blob of imported-package IDs and SHA hashes,
and emit empty optional-segment / name-map / localized / redirect
sections so the reader finds what it expects all the way through.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Unreal's FIoContainerId::FromName hashes the lowercased container name
(UTF-16LE) with CityHash64. UnrealReZen was defaulting to a random u64,
which meant the TOC's stored ContainerId didn't match the ID the engine
computes from the file name at mount time — games that key their
container registry on the name-hash silently ignored the archive.
Use FPackageId.FromName (the same algorithm, already in CUE4Parse) on
Path.GetFileNameWithoutExtension(OutputPath). Verified against
pakchunk0-Windows.utoc: CityHash64("pakchunk0-windows") matches the
shipped ContainerId exactly. --container-id still overrides.
Drop the now-unused RandomUlong helper.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #54.
Stacked on top of #73 — reviewing with base `cleanup-and-fixes` keeps the diff scoped to the perfect-hash work. Once #73 merges, GitHub will auto-retarget this PR to `master`.
Problem
UnrealReZen was emitting UTOC v3 with `TocChunkPerfectHashSeedsCount = 0`. FModel/ZenTools (and every other reader that took the v4+ path from Unreal) expect to resolve chunk IDs via the two-level perfect hash, so they reject the container as unreadable.
Fix
Test plan