Stream HTTP read_range body so misbehaving servers cannot OOM (#2264)#2273
Conversation
`_HTTPSource.read_range()` used `preload_content=True` (the urllib3 default), so the whole response body landed in `resp.data` before the `data[:length]` slice in `_validate_range_response` ever ran. A server that ignored `Range` and returned a 200 with a multi-GiB body therefore defeated the OOM guard the slice comment claimed: a 16 KiB metadata prefetch against a 2 GiB body still pulled 2 GiB into memory before the slice fired. `MAX_HTTP_HEADER_BYTES` was advisory, not enforcing. Change `read_range` to mirror the streaming pattern `read_all` adopted in #2051: request the body with `preload_content=False`, preflight the advertised `Content-Length`, then drain with `_read_capped` so the cap is enforced on the wire. The 206-with-Content-Range path keeps its existing validator. The 200-without-Content-Range fallback (server ignored `Range`) is allowed up to a tight absolute slack (`_RANGE_IGNORED_FULL_OBJECT_CAP = 16 MiB`); larger bodies are rejected on the preflight or aborted mid-stream. The validator still slices the fallback body down to `length` for callers. Updates the mock `_MockResp` / `_MockPoolResponse` test fixtures in test_http_no_stdlib_fallback_2050.py, test_ssrf_hardening_1664.py, and test_dns_rebinding_pin_issue_1846.py to expose `stream()`, `release_conn()`, and `Content-Length` so they remain compatible with the streaming code path. Updates the existing read_range_validation suite to cover both halves of the new contract -- honest oversize via Content-Length preflight, dishonest oversize via the streaming cap -- and pins the new wire-level requirement (`preload_content=False`). Closes #2264.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Stream HTTP read_range body so misbehaving servers cannot OOM (#2264)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/_sources.py:1005-read_rangehas no docstring on_HTTPSource. The baseSource.read_rangeis documented at the module level (line 9), but the HTTP override now carries non-trivial extra contract (Content-Length preflight, streaming cap, 16 MiB slack forstart=0fallback, OSError on overshoot). A short docstring summarising "raises OSError when the server's body is past the byte budget for this call" would save the next reader from chasing comments. -
xrspatial/geotiff/_sources.py:1137-1142- The slice comment in_validate_range_responsestill says "a 16 KiB prefetch against a 2 GiB object doesn't drag the whole thing into memory". The whole point of this PR is that the slice never actually achieved that. With the new cap the worst case is_RANGE_IGNORED_FULL_OBJECT_CAP(16 MiB) of transient buffer, not 2 GiB. Update the comment so it does not contradict the new behaviour. -
xrspatial/geotiff/_sources.py:1310-1312-_read_capped's OSError message refers only to "Issue #2051". When reached fromread_range(the new caller) the trail leads readers to the wrong issue. Either reference both issues or drop the specific number and rely on the call site's message for context. -
xrspatial/geotiff/_sources.py:1003-_RANGE_IGNORED_FULL_OBJECT_CAP = 16 MiBis the maximum transient buffer per call. Withread_rangesrunning up to 8 workers concurrently, the worst-case transient is ~128 MiB if every worker hits a Range-blind server. The current code is still vastly better than the pre-#2264 behaviour, but it's worth noting this in the constant's docstring or inread_rangesso future tuning is informed.
Nits (optional improvements)
-
xrspatial/geotiff/_sources.py:995-1003-_RANGE_IGNORED_FULL_OBJECT_CAPis a class attribute, which is fine, but it sits next to other module-level HTTP knobs (_HTTP_MAX_REDIRECTS,INITIAL_HTTP_HEADER_BYTESin_cog_http). Promoting it to module-level would help the next person tweaking HTTP behaviour find it. Class-level does make themonkeypatch.setattr(_HTTPSource, ...)pattern in the tests cleaner, so it's a judgement call. -
xrspatial/geotiff/_sources.py:1024-1045- Thetry/finallyaround_check_range_content_lengthand_read_cappedrunsresp.release_conn()in finally, then_validate_range_responsereadsresp.statusandresp.headersafter release_conn. This works (urllib3 keeps status/headers populated after release), but the flow is slightly non-obvious. Either copystatusandcontent_rangeinto locals before the finally, or move_validate_range_responseinside the try block. Theread_allmethod has the same shape, so consistency is also fine.
What looks good
- The streaming-with-preflight pattern matches the
read_alldesign from #2051. Symmetry across the two HTTP read paths is easier to audit than two different fixes. - The test plan covers four interesting cases: honest oversize via Content-Length, dishonest oversize via chunked, legitimate small-file fallback, and the
preload_content=Falsewire contract. The wire-contract test in particular is the right kind of regression bait - a future refactor that flipped the default would fail it immediately. - The mock-fixture updates in test_ssrf_hardening_1664.py and test_dns_rebinding_pin_issue_1846.py are scoped and minimal: just
stream(),release_conn(), andContent-Length. No drive-by changes to the SSRF or DNS-rebinding test logic. - The existing test
test_range_ignored_200_with_full_body_is_sliced_to_lengthwas correctly renamed and updated to reflect the new behaviour (nowtest_range_ignored_200_full_object_sliced_within_cap). The old test name was the giveaway: it asserted slicing happened, not that buffering was bounded.
Checklist
- Algorithm matches reference/paper - HTTP-only path, no algorithm involved.
- All implemented backends produce consistent results - HTTP-only, no backend dispatch.
- NaN handling is correct - N/A, byte-level code.
- Edge cases are covered by tests - empty body, short body, oversize body, chunked oversize, zero-length, nonzero start.
- Dask chunk boundaries handled correctly - N/A.
- No premature materialization or unnecessary copies -
_read_cappedjoins chunks once at end. - Benchmark exists or is not needed - not needed for a defensive HTTP fix.
- README feature matrix updated (if applicable) - N/A.
- Docstrings present and accurate - partially; see suggestion on
read_rangedocstring.
…2264) - Add a docstring to ``_HTTPSource.read_range`` covering the post-#2264 contract (Content-Length preflight, streaming cap, full-object slack for ``start=0`` fallback, ``OSError`` on overshoot). The base ``Source.read_range`` is documented at module level but the HTTP override now carries non-trivial extra contract that deserves its own signpost. - Expand the ``_RANGE_IGNORED_FULL_OBJECT_CAP`` docstring to note the worst-case concurrent transient under ``read_ranges`` (~128 MiB with the default 8-worker pool), so future tuning of ``max_workers`` is informed. - Update the stale slice comment in ``_validate_range_response`` so it no longer claims to protect against a 2 GiB body; the new cap puts the worst case at ``_RANGE_IGNORED_FULL_OBJECT_CAP`` (16 MiB). - Update ``_read_capped``'s OSError message to reference both #2051 and #2264, since ``read_range`` is now a caller too. - Move ``_validate_range_response`` inside the ``try`` block so the status/headers reads happen before ``release_conn`` runs. The previous ordering worked (urllib3 keeps those populated post-release) but was easy to misread.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review after edc3313
Reviewed the second commit. Verdict: ready to merge once CI is green.
Disposition of original findings
- Suggestion (no docstring on
_HTTPSource.read_range) -- fixed at_sources.py:1012-1034. Docstring lists status / Content-Length / Content-Range axes and names the OSError contract. - Suggestion (stale 2 GiB comment in
_validate_range_response) -- fixed. The comment now cites_RANGE_IGNORED_FULL_OBJECT_CAP(16 MiB) and references #2264. - Suggestion (
_read_cappedreferences only #2051) -- fixed. The OSError message now lists both #2051 and #2264. - Suggestion (document the 128 MiB worst-case under
read_ranges) -- fixed in the constant's docstring. - Nit (release_conn / validator ordering) -- fixed.
_validate_range_responsenow runs inside the try block;release_connruns in finally after validation completes or raises. - Nit (class-level vs module-level constant) -- left as-is. Class-level keeps the
monkeypatch.setattr(_HTTPSource, ...)test pattern clean, which the test suite relies on.
What looks good
- Docstring is explicit about which budget applies in which branch, which is the part that took the longest to reason about from the original code.
- The validator-inside-try refactor is a small readability win without changing behaviour. Easy to verify the response is alive when status/headers are read.
Outstanding
None blocking. CI is the last gate.
Summary
_HTTPSource.read_range()to streaming (preload_content=False) and cap the body on the wire with_read_capped, matching theread_allpattern from [Security] _HTTPSource.read_all() is unbounded — small declared raster can pull huge HTTP body #2051. Before this change urllib3 buffered the whole response intoresp.databefore thedata[:length]slice in_validate_range_responseran, so a server that ignoredRangeand returned a 200 with a multi-GiB body still OOM'd the process.MAX_HTTP_HEADER_BYTESwas advisory, not enforcing.Rangeon astart=0prefetch), the legitimate "small file served whole" case still works, but the slack is capped at a tight absolute (_RANGE_IGNORED_FULL_OBJECT_CAP = 16 MiB). The validator still slices that fallback body down tolengthfor the caller._read_capped. The 206-with-Content-Range path keeps its existing behaviour but now also goes through the streaming path so the cap is uniform.Test plan
pytest xrspatial/geotiff/tests/test_http_range_validation_1735.py: 10 passing. Covers honest-oversize-rejected, slack-fallback-sliced, short-body-returned, dishonest-oversize-streamed-and-capped, thepreload_content=Falsewire contract, and the pre-existing 1735 regression cases.pytest xrspatial/geotiff/tests/test_sidecar_ovr_2112.py: 28 passing. Includes the SimpleHTTPRequestHandler-ignores-Range scenario that drove the slack-cap design.pytest xrspatial/geotiff/tests/test_ssrf_hardening_1664.py test_dns_rebinding_pin_issue_1846.py test_http_no_stdlib_fallback_2050.py test_http_meta_buffer_1718.py: 68 passing. Mock-fixture updates absorbed cleanly.pytest xrspatial/geotiff/tests/: 5041 passing. One pre-existing lz4 failure onmainunrelated to this change.Closes #2264.