pre-size chunk buffer to avoid realloc fragmentation; fix nbit/scaleoffset no-op return values by brtnfld · Pull Request #6389 · HDFGroup/hdf5

brtnfld · 2026-05-01T17:24:34Z

Pre-size chunk buffer before the filter pipeline (H5Dchunk.c)

When reading a filtered chunk, the decompression buffer was initialized at the compressed on-disk size. Filters like deflate grow this buffer via repeated realloc() calls, doubling its size when the output exceeds capacity. On Windows, this fragments the heap and causes read times to increase steadily across successive iterations.

The fix pre-sizes the buffer to the full uncompressed chunk_size when a filter pipeline is active, so the output buffer is large enough from the start and no realloc is needed.

Fixes #4481 and Fixes #4513.

Return nbytes instead of *buf_size in nbit and scaleoffset no-op paths (H5Znbit.c, H5Zscaleoffset.c)

Per the HDF5 filter API, nbytes is the count of valid data bytes in the buffer; *buf_size is the allocated buffer capacity. These two values were historically always equal on the read path, so either worked in practice. The pre-sizing change above makes *buf_size > nbytes on compressed reads for the first time, exposing two latent bugs:

H5Znbit.c: the no-compress pass-through path (cd_values[1] == 1) returned *buf_size instead of nbytes. The downstream filter (e.g., Fletcher32) then treated the full pre-sized buffer as valid input and computed a checksum over uninitialized bytes, causing filter pipeline failures.
H5Zscaleoffset.c: the no-process path (when the data already spans the full bit range and no scaling is needed) had the same *buf_size vs nbytes mistake. No current test exercises this path through a multi-filter pipeline where the size mismatch would be observable, but the fix is correct by the same API reasoning.

brtnfld · 2026-05-01T17:38:17Z

I will need to do a PR to netcdf for a fix.

fortnern · 2026-05-01T18:12:59Z

+                 * This avoids repeated realloc() inside the filter pipeline (e.g., deflate's
+                 * doubling strategy when the output buffer fills), which on Windows fragments
+                 * the heap and causes read time to increase steadily across iterations. */
+                if ((old_pline && old_pline->nused) && buf_alloc < chunk_size)


I'm not sure you need the first condition - you can probably assert that chunk_nbytes == chunk_size if there are no filters (I think it was checked earlier):

if (buf_alloc < chunk_size) { assert(old_pline && old_pline->nused); buf_alloc = chunk_size; }

Revised it.

…Windows When reading a filtered (compressed) chunk, filters like deflate initialize their output buffer using *buf_size (the pipeline's buffer capacity hint). With *buf_size set to chunk_disk_size (the compressed size), deflate grows its output buffer via repeated realloc() doubling. On Windows this fragments the heap and causes read times to increase steadily across iterations. Set buf_alloc to MAX(chunk_disk_size, chunk_size) immediately before calling H5Z_pipeline so filters pre-allocate their output at the full uncompressed size and avoid realloc. The inbuf is still allocated at chunk_disk_size — only the hint passed to the pipeline is enlarged — so there is no double-large-allocation overhead and peak memory stays at chunk_disk_size + chunk_size rather than 2 * chunk_size. For incompressible chunks where chunk_disk_size > chunk_size the hint remains at chunk_disk_size, satisfying H5Z_pipeline's post-filter size validation. Fixes HDFGroup#4481 and HDFGroup#4513.

… paths Per the HDF5 filter API, nbytes is the count of valid data bytes in the buffer, while *buf_size is the allocated buffer capacity. These two values were historically always equal on the read path, so either worked in practice. The pre-sizing change (which sets buf_alloc = chunk_size before the filter pipeline) makes *buf_size > nbytes for compressed reads, exposing the latent bug. In H5Znbit.c: the no-compress pass-through (cd_values[1] == 1) was returning *buf_size, causing the next filter (e.g. fletcher32) to treat the full pre-sized buffer as valid data and compute a checksum over uninitialised bytes. In H5Zscaleoffset.c: the no-process path had the same pattern. No current test exercises this path through a multi-filter pipeline where the size mismatch would be observable, but the fix is correct by the same API reasoning.

Instead of allocating a fresh output buffer (H5MM_malloc), copy the compressed input to a small temp buffer and resize *buf in-place via H5resize_memory. On Windows, HeapReAlloc can often extend an existing heap block without moving it, avoiding the cost of finding and committing a fresh large allocation for every chunk read. This is combined with the pre-sizing hint in H5Dchunk.c that sets buf_alloc = chunk_size before the pipeline call, so the resize goes directly to chunk_size in one step with no realloc loop.

…pacity Replace the hint-only buf_alloc enlargement with an actual H5D__chunk_mem_realloc() before calling H5Z_pipeline. Every filter now sees *buf_size equal to the real buffer capacity, not just an advisory hint. This fixes a bounds-check regression in H5Zscaleoffset where the read path uses *buf_size as the end-of-buffer sentinel in H5_IS_BUFFER_OVERFLOW and as the input-size argument to H5Z__scaleoffset_decompress; an inflated hint caused false-negative overflow checks and potential over-reads when scaleoffset is the on-disk filter. H5Zdeflate is simplified accordingly: the up-front H5resize_memory(*buf, nalloc) is removed since the buffer is already at nalloc on entry. The inbuf copy is retained (still needed to read compressed input while writing uncompressed output into the same buffer), as is the realloc-doubling loop for the uncommon case where output exceeds chunk_size. Also add test/chunk_deflate_perf.c, a standalone benchmark that times per-chunk deflate reads over multiple passes to detect the steady read-time increase caused by heap fragmentation on Windows (issues HDFGroup#4481 / HDFGroup#4513).

Add optional 4th argument to specify the output HDF5 file path (default: chunk_deflate_perf.h5 in CWD). Allows running develop and fix builds against separate files so pass 1 is cold-cache for both and the two runs don't share page-cache state.

Not suitable for the test suite; intended for manual Windows validation only. Keep locally if needed.

…pacity Allocate the chunk read buffer at MAX(chunk_disk_size, chunk_size) from the start rather than allocating at chunk_disk_size and immediately reallocating. The read only fills chunk_disk_size bytes regardless of buffer size, so there is no cost to the larger initial allocation and the separate realloc step is eliminated. Every filter now receives *buf_size equal to the actual buffer capacity with no additional allocation needed. On Windows a single HeapAlloc at the correct size avoids the repeated realloc-doubling in the deflate filter that fragments the heap and causes read times to increase over successive iterations (issues HDFGroup#4481 / HDFGroup#4513). Also fixes the scaleoffset bounds-check regression where *buf_size was used as an end-of-buffer sentinel.

Reverts the deflate in-place decompression experiment and the H5Dchunk.c pre-sizing changes back to the state at 9ab5e19, keeping the nbit and scaleoffset no-op path fixes.

…ipeline The previous approach allocated the buffer at chunk_disk_size and then bumped buf_alloc to chunk_size as a hint to the pipeline, causing *buf_size to misrepresent the actual allocation. Any filter that writes up to *buf_size bytes into *buf would overflow. Allocate at MAX(chunk_disk_size, chunk_size) upfront so the buffer and the hint given to filters are always consistent. For incompressible chunks where chunk_disk_size >= chunk_size the allocation is unchanged.

brtnfld requested review from bmribler, derobins, fortnern, glennsong09, jhendersonHDF, lrknox, mattjala, qkoziol and vchoi-hdfgroup as code owners May 1, 2026 17:24

github-project-automation Bot added this to HDF5 - TRIAGE & TRACK May 1, 2026

github-project-automation Bot moved this to To be triaged in HDF5 - TRIAGE & TRACK May 1, 2026

brtnfld marked this pull request as draft May 1, 2026 17:24

brtnfld assigned fortnern May 1, 2026

brtnfld added the Component - C Library Core C library issues (usually in the src directory) label May 1, 2026

fortnern reviewed May 1, 2026

View reviewed changes

brtnfld force-pushed the 4481 branch 2 times, most recently from fb096a0 to 6ab8428 Compare May 1, 2026 19:26

brtnfld marked this pull request as ready for review May 1, 2026 21:37

fortnern previously approved these changes May 1, 2026

View reviewed changes

brtnfld assigned jhendersonHDF May 4, 2026

brtnfld dismissed fortnern’s stale review via f1f1b72 May 8, 2026 13:24

brtnfld force-pushed the 4481 branch 2 times, most recently from f1f1b72 to bc4511b Compare May 8, 2026 13:26

brtnfld added 2 commits May 8, 2026 08:56

brtnfld force-pushed the 4481 branch from bc4511b to 9ab5e19 Compare May 8, 2026 13:56

brtnfld and others added 3 commits May 8, 2026 10:33

Committing clang-format changes

cf55eeb

brtnfld added 5 commits May 8, 2026 11:08

test: remove chunk_deflate_perf benchmark

b73b2a0

Not suitable for the test suite; intended for manual Windows validation only. Keep locally if needed.

revert: restore H5Dchunk.c and H5Zdeflate.c to pre-experiment state

de84cf6

Reverts the deflate in-place decompression experiment and the H5Dchunk.c pre-sizing changes back to the state at 9ab5e19, keeping the nbit and scaleoffset no-op path fixes.

fortnern approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pre-size chunk buffer to avoid realloc fragmentation; fix nbit/scaleoffset no-op return values#6389

pre-size chunk buffer to avoid realloc fragmentation; fix nbit/scaleoffset no-op return values#6389
brtnfld wants to merge 10 commits into
HDFGroup:developfrom
brtnfld:4481

brtnfld commented May 1, 2026 •

edited

Loading

Uh oh!

brtnfld commented May 1, 2026

Uh oh!

fortnern May 1, 2026 •

edited

Loading

Uh oh!

brtnfld May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

brtnfld commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brtnfld commented May 1, 2026

Uh oh!

fortnern May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brtnfld May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brtnfld commented May 1, 2026 •

edited

Loading

fortnern May 1, 2026 •

edited

Loading