Question: would a shape-aware preprocessor be a useful contrib/ tool?

````markdown
## Question: would a shape-aware preprocessor be useful as a `contrib/` tool?

Hi zstd team,

I wanted to ask whether this kind of contribution would be of interest before opening a PR.

I have built a small open-source preprocessor library called **STRATA**:

https://github.com/rjamesy/strata

It is MIT-licensed and exposes a few reversible, shape-aware transforms:

- 2D predictors
- YCoCg-R colour decorrelation
- full octahedral cube rotation
- radial reordering for 3D voxels
- an auto-selector that keeps raw input as a candidate, so the selected output is never worse than the underlying codec alone

The result that prompted the question is that, for some shape-aware data, `STRATA-preprocess + zstd -1` beats `raw input + zstd -22` on both speed and ratio.

Examples:

```text
27 MB RGB photo:
YCoCg-R + zstd -1
= 463x faster, 5.4% smaller than raw zstd -22

Smooth 2D heightmap:
2D predictor + zstd -1
= 4.2x faster, 44.5% smaller than raw zstd -22

64^3 byte volume:
cube rotation + radial reordering + zstd -22
= 14.4% smaller than raw zstd -22, at roughly the same speed
````

The mechanism is not zstd-specific. The transforms expose structure that is hard for a byte-stream compressor to infer from raw input, especially when the redundancy is smoothness rather than exact repeated byte sequences.

The repo includes a reproduction script:

```bash
python3 bench/preprocess_demo.py
```

It writes a CSV covering the tested codec/dataset combinations.

My questions:

1. Would this kind of shape-aware preprocessor be a plausible fit for `contrib/`?

   I am not proposing changes to zstd core. The transforms are separate reversible passes, roughly 30–300 LOC each, and the tool would emit zstd-compressible bytes.

2. Is the design preference for `contrib/` to remain fully byte-stream-general, or are optional shape-aware tools acceptable there?

   I saw `contrib/seekable_format` and `contrib/pzstd`, so I wanted to ask rather than assume the boundary.

3. Are there specific data classes the team would want benchmarked before considering something like this?

   My current corpus is small: terrain, CT-style volumes, RGB photos, and sensor CSV.

I am also happy to keep this as an external library if that is the better fit. I mainly wanted to check before opening a PR.

Cheers,
Richard

```
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: would a shape-aware preprocessor be a useful contrib/ tool? #4659

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question: would a shape-aware preprocessor be a useful contrib/ tool? #4659

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions