review what this is about and make recommendations:
Key Callgrind findings:
- Total: 6.038B instruction refs
- add_directory / add_file: 88.86%
- commit: only 1.27%
- dominant hotspot: oxiarc_zstd::lz77::MatchFinder::find_best_match
- 25.84% self
- about 50.17% inclusive through find_sequences
- Related zstd hash/match helpers dominate the rest of the annotated cost.
Then I validated with a control run using the same 100 MiB bytes, but hardlinked with .zip names so the existing likely_incompressible_path() skips compression:
- Current no-extension corpus: 3.28s, 78,792 KiB, 110,231,648 bytes
- Same bytes with .zip extension: 0.77s, 78,992 KiB, 110,231,648 bytes
So the problem is clear: our incompressible detector does not run for 25 KiB files because MIN_INCOMPRESSIBLE
The next fix should be a file-frame-specific entropy check for smaller files, probably sampling the whole payload for frames above something like 8 KiB, while leaving page-body behavior more
conservative.
review what this is about and make recommendations: