Skip to content

profile the compression #14

@bsutton

Description

@bsutton

review what this is about and make recommendations:

Key Callgrind findings:

  • Total: 6.038B instruction refs
  • add_directory / add_file: 88.86%
  • commit: only 1.27%
  • dominant hotspot: oxiarc_zstd::lz77::MatchFinder::find_best_match
    • 25.84% self
    • about 50.17% inclusive through find_sequences
  • Related zstd hash/match helpers dominate the rest of the annotated cost.

Then I validated with a control run using the same 100 MiB bytes, but hardlinked with .zip names so the existing likely_incompressible_path() skips compression:

  • Current no-extension corpus: 3.28s, 78,792 KiB, 110,231,648 bytes
  • Same bytes with .zip extension: 0.77s, 78,992 KiB, 110,231,648 bytes

So the problem is clear: our incompressible detector does not run for 25 KiB files because MIN_INCOMPRESSIBLE

The next fix should be a file-frame-specific entropy check for smaller files, probably sampling the whole payload for frames above something like 8 KiB, while leaving page-body behavior more
conservative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    nextidentifies items that are to be implemented next.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions