Skip to content

Cap merged-range size in coalesce_ranges to bound HTTP over-fetch (#2266)#2270

Merged
brendancol merged 2 commits into
mainfrom
issue-2266
May 21, 2026
Merged

Cap merged-range size in coalesce_ranges to bound HTTP over-fetch (#2266)#2270
brendancol merged 2 commits into
mainfrom
issue-2266

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Add a max_coalesced_range_bytes cap to coalesce_ranges so a tile table with many small valid byte counts and sub-MiB gaps can no longer chain into a multi-GiB merged GET. The cap follows the per-tile cap pattern: default MAX_COALESCED_RANGE_BYTES_DEFAULT (256 MiB), overridable via XRSPATIAL_COG_MAX_COALESCED_RANGE_BYTES.
  • Plumb the cap through _HTTPSource.read_ranges_coalesced and _FSSpecSource.read_ranges_coalesced so both COG remote-read paths pick up the bound automatically.
  • Add eight unit tests: the 4096-tile adversarial case from the issue, byte-level round-trip after a forced split, env-var override, the zero-cap escape hatch, no false splits on legitimate back-to-back tiles, and HTTP wrapper propagation.

Closes #2266.

Backend coverage

HTTP / cloud range-fetching code; the four-backend grid does not apply. The fix is in _sources.py and is exercised by both _HTTPSource (urllib3) and _FSSpecSource (cloud).

Test plan

  • pytest xrspatial/geotiff/tests/test_http_cog_coalesce.py -- 19 passed (11 existing + 8 new)
  • Full xrspatial/geotiff/tests/ minus one pre-existing lz4 opt-in test unrelated to this change -- 5045 passed

)

`coalesce_ranges` merged any two adjacent byte ranges whose gap was at
most `gap_threshold` (1 MiB default), with no upper bound on the
resulting merged-range length. A tile table with many small valid
`TileByteCount` values and offsets spaced just under 1 MiB apart would
chain into a single merged GET of roughly `num_tiles * gap_threshold`
bytes, even when every individual tile passed the per-tile 256 MiB cap
at `_cog_http.py:883`.

Add a `max_coalesced_range_bytes` parameter to `coalesce_ranges` and
plumb it through `_HTTPSource.read_ranges_coalesced` and
`_FSSpecSource.read_ranges_coalesced`. When extending the current
merged range would exceed the cap, seal it and start a new one. The
default cap reads from `XRSPATIAL_COG_MAX_COALESCED_RANGE_BYTES` and
falls back to `MAX_COALESCED_RANGE_BYTES_DEFAULT` (256 MiB), matching
the existing per-tile policy. A non-positive cap disables the size
check (gap-only behaviour, for callers with their own bookkeeping).
Single oversized input ranges still emit intact -- rejecting those is
the per-tile cap's job, not the coalescer's.

Add eight tests covering: the cap forcing splits on adversarial input,
byte-level round-trip after a forced split, the default cap bounding
the 4096-tile / 4 GiB scenario, the zero-cap escape hatch, no false
splits on legitimate back-to-back tiles, env-var override, oversized
single-input pass-through, and HTTP wrapper propagation through
`_MockHTTPSource`.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Cap merged-range size in coalesce_ranges to bound HTTP over-fetch (#2266)

Blockers (must fix before merge)

  • None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_cog_http.py:914 -- the call site reads XRSPATIAL_COG_COALESCE_GAP from env explicitly and passes it through gap_threshold, but it does not read XRSPATIAL_COG_MAX_COALESCED_RANGE_BYTES the same way. The env var is still honored because coalesce_ranges resolves None against the env, so behavior is correct. The asymmetry is a readability hazard: a future reader of the call site cannot tell from the call whether the cap is enforced. Either drop the explicit gap lookup at the call site (let coalesce_ranges read both env vars), or add the symmetric max_coalesced_range_bytes lookup at the call site. Either way the call becomes self-documenting.

Nits (optional improvements)

  • _sources.py:657 -- the comment "Gaps may be negative if a later-listed range overlaps an earlier one; clamp so the merged length covers both" is slightly stale. The clamp is still there (new_end = max(cur_end, off + length)), but it moved out of the if body to a few lines above. Either move the comment up next to the line where the clamp actually happens, or trim it.
  • test_coalesce_caps_merged_range_size_2266 (line 138) asserts the set of mapping triples has 8 distinct entries. The next test (test_coalesce_cap_round_trips_bytes_2266) already covers byte-level round-trip, so the set assertion is redundant. Not a bug, just dead weight.

What looks good

  • The cap is layered on top of the gap check, not replacing it. Legitimate back-to-back tiles still coalesce.
  • Both _HTTPSource and _FSSpecSource get the parameter, so the cloud path inherits the same defense.
  • Re-exports in _reader.py keep the public import surface (from xrspatial.geotiff._reader import ...) consistent with the existing pattern.
  • Tests cover the 4096-tile adversarial scenario, env-override, the zero-cap escape hatch, byte round-trip after a forced split, and HTTP wrapper propagation through a mock source.
  • "Single oversized input is emitted intact" is the correct semantics and is documented. Rejecting oversized individual tiles belongs at the per-tile cap, not in the coalescer.

Checklist

  • Algorithm correctness: bug fix; cap added correctly.
  • [n/a] Backend grid: HTTP / cloud code, not a four-backend op.
  • Edge cases (zero cap, oversized single, legitimate back-to-back, env override).
  • [n/a] Dask boundaries.
  • No premature materialization or unnecessary copies.
  • [n/a] Benchmark: internal helper, no perf-sensitive new API.
  • [n/a] README feature matrix: no new public function.
  • Docstrings present and accurate.

Review on PR #2270 flagged three items. All applied:

* Suggestion: `_cog_http.py:914` call site now resolves
  `_max_coalesced_range_bytes_from_env()` and passes it explicitly as
  `max_coalesced_range_bytes=...`, mirroring the existing
  `gap_threshold=gap` lookup. Behaviour is unchanged (the env var was
  already honoured implicitly via `None` resolution in
  `coalesce_ranges`) but a future reader of the call site can now see
  the cap without crossing files.
* Nit: stale comment in `_sources.py` describing the negative-gap
  clamp moved up next to the line where the clamp actually happens
  (the `new_end = max(...)` was hoisted out of the `if` body in the
  previous commit).
* Nit: dropped the redundant 8-distinct-mapping-triples assertion in
  `test_coalesce_caps_merged_range_size_2266`; the byte round-trip in
  the next test already covers correctness after the split.

Also update two test mocks
(`tests/test_security.py::_MockHTTPSource.read_ranges_coalesced` and
`tests/test_strip_zero_dims_2053.py::...read_ranges_coalesced`) to
accept the new `max_coalesced_range_bytes` kwarg, since `_read_cog_http`
now passes it through.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after f3ec887

All three findings from the previous review are applied. No new findings.

  • Suggestion (call-site asymmetry): fixed. _cog_http.py:914 now resolves _max_coalesced_range_bytes_from_env() explicitly and passes the cap as max_coalesced_range_bytes=.... The env var was already honoured implicitly; the explicit lookup just makes the call site self-documenting.
  • Nit (stale clamp comment): fixed. The "Gaps may be negative ..." comment moved up next to the new_end = max(cur_end, off + length) line where the clamp actually happens.
  • Nit (redundant set assertion): fixed. The 8-distinct-mapping-triples check was removed; test_coalesce_cap_round_trips_bytes_2266 already covers correctness after a forced split.

Test mocks in tests/test_security.py and tests/test_strip_zero_dims_2053.py picked up the new max_coalesced_range_bytes kwarg so the call from _read_cog_http keeps working under the mock.

Checklist (re-verified)

  • No new blockers, suggestions, or nits.
  • pytest xrspatial/geotiff/tests/ -- 5045 passed, 1 deselected (pre-existing lz4 failure unrelated to this PR).

@brendancol brendancol merged commit 6588e1f into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COG range coalescing can turn safe tile reads into huge over-fetches

1 participant