Positive VRT mosaic tests over GeoTIFF sources (#2369)#2373
Open
brendancol wants to merge 6 commits into
Open
Conversation
Several call sites in xrspatial/geotiff/ classified HTTP(S) sources
with startswith(('http://', 'https://')). That comparison is
case-sensitive, so URLs like HTTP://169.254.169.254/... slipped past
_HTTPSource and the SSRF / DNS-pinning checks in _validate_http_url
and fell through to fsspec. RFC 3986 section 3.1 defines URI schemes
as case-insensitive.
Adds _is_http_source(s) in _sources.py, backed by
urlparse(s).scheme.lower() in ('http', 'https'), and routes every
dispatch site through it: _sources.py (_is_fsspec_uri + _open_source),
_reader.py, _writer.py (_is_fsspec_uri), _backends/dask.py,
_backends/gpu.py (eager-via-CPU branch + GDS opt-in gate), and the
_sidecar.py _is_http_url helper.
New regression tests in test_http_scheme_case_2321.py cover the
helper, _open_source dispatch for HTTP / HTTPS / Http / hTTpS, the
partner _is_fsspec_uri classifiers in both _sources and _writer, the
sidecar helper, and the end-to-end SSRF rejection of uppercase URLs
that resolve to 127.0.0.1, 169.254.169.254, 10.0.0.1, 192.168.1.1,
0.0.0.0, and localhost. socket.getaddrinfo is monkeypatched so the
tests stay offline.
Parent: #2321 sub-task 5.
#2332) Follow-ups on the /review-pr pass for PR #2337: * Hoist ``from urllib.parse import urlparse`` to ``_sources.py`` module top so ``_is_http_source`` no longer pays the import cost per call. * Hoist ``from ._sources import _is_http_source`` to module top in ``_writer.py`` and ``_sidecar.py`` (no cycle: neither is imported by ``_sources``). * Keep the function-scope imports in ``_backends/dask.py`` and ``_backends/gpu.py``: the backend modules deliberately defer the reader / sources import so the package can be imported without urllib3 in dask-only environments. Add a one-line comment so the next reader does not assume it is an oversight. * Reject HTTP(S) write targets in ``_write_bytes`` with a typed ``NotImplementedError`` before the local file write path interprets the URL as a filename. The error message lists the supported destinations. Covers both lowercase and uppercase scheme casing. * Lock the behavioural broadening: ``_is_http_source('http:foo')`` now returns True (urlparse-correct, broader than the old ``startswith``). Added a test asserting the classifier matches and that ``_open_source('http:foo')`` raises ``UnsafeURLError`` so the validator still rejects it as "no hostname". * Added 7 new test cases (helper + open_source + 5 writer-rejection parametrizations); full geotiff suite: 5324 passed, 68 skipped.
Reconciles the case-insensitive HTTP(S) dispatch helper introduced on both branches: * main #2326 added ``_is_http_url`` in ``_sources.py`` and routed the geotiff dispatch sites through it. * PR #2337 added the same fix as ``_is_http_source`` with broader test coverage, hoisted imports, and a typed ``NotImplementedError`` for HTTP(S) write targets in ``_write_bytes``. Resolution keeps ``_is_http_source`` as the canonical name (used by this PR's tests and the writer-side guard) and adds an ``_is_http_url`` alias in ``_sources.py``, plus a re-export from ``_reader.py``, so the test file shipped with #2326 (``test_uppercase_scheme_ssrf_2323.py``) keeps importing ``_is_http_url`` unchanged. The ``_sidecar.py`` wrapper ``_is_http_url`` now delegates to ``_is_http_source`` directly instead of through the canonical alias. Full geotiff suite: 5413 passed, 64 skipped, 1 xfailed, 1 xpassed.
Sub-task of #2342. Covers the supported simple-VRT subset on both the eager numpy and dask read paths: - 2x1 horizontal mosaic of two compatible tiles - 2x2 mosaic of four compatible tiles - Windowed read aligned with the source seam - Dask reads of both mosaics with multiple chunks - Multi-band 2x1 mosaic Each test asserts values, coords, and key attrs (crs, transform, nodata) so the contract is checked at the pixel level rather than on shape alone.
brendancol
commented
May 25, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Positive VRT mosaic tests over GeoTIFF sources (#2369)
Blockers
None.
Suggestions
-
xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:118,146,184: the three fixtures usetempfile.mkdtemp(prefix='tmp_2369_...'), which leaks directories between runs. The implementation notes called fortmp_path, and most newer tests in this directory use the pytest fixture. Switching gets auto-cleanup and keeps the prefix-based collision protection. -
xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:240-254:_assert_attrs_okchecks the pixel size on the transform but notorigin_x/origin_y. A transform with the wrong origin would still pass. Assertingtransform[2]andtransform[5]against the fixture origin would catch that case.
Nits
-
xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:333:h, w_total = expected.shape--w_totalis unused. Drop it or reference it in the comment. -
xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.py:410-431: no dask variant of the multi-band test. The eager case covers the contract; a chunked multi-band read would also pin down per-block band handling.
What looks good
- Six tests, narrow scope, exactly the supported subset from the issue.
- 2x2 fixture uses one distinct constant per tile so placement bugs show up in the value diff instead of as numeric drift.
- Each test asserts values, coords, and key attrs (crs, transform, nodata).
- Tile builders are short and shared cleanly across fixtures.
Checklist
- Algorithm matches reference: synthesises mosaics from
to_geotiff+ internalwrite_vrt, mirroringtest_vrt_lazy_chunks_1814.py. - All implemented backends produce consistent results: dask compute compared against eager.
- NaN handling: not exercised in this PR; mosaics use finite data plus declared nodata sentinel.
- Edge cases: windowed read on a 2x1 seam covers the documented boundary case.
- Dask chunk boundaries handled correctly: 2x1 and 2x2 mosaics with sub-tile chunks both match eager.
- No premature materialization: compute() only fires after the multi-block dask graph is built.
- [n/a] Benchmark exists or is not needed: test-only PR.
- [n/a] README feature matrix updated: no public API change.
- Docstrings present and accurate.
- Switch the three fixtures from tempfile.mkdtemp to tmp_path subdirectories so pytest cleans them up between runs while keeping per-issue prefix protection. - _assert_attrs_ok now optionally checks transform[2] (origin_x) and transform[5] (origin_y); every call site passes the fixture origin so a translation bug would not slip through. - Drop the unused w_total in the windowed test. - Add test_dask_multiband_2x1_mosaic_matches_eager so the multi-band contract is covered on the dask path too.
brendancol
commented
May 25, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (post-fixes)
Disposition for the four findings from the previous review:
- Fixture
tempfile.mkdtemp->tmp_pathsubdirs. Fixed. -
_assert_attrs_oknow checkstransform[2]/transform[5]. Every call site passes the fixture origin; the windowed call passes the shifted origin (ox + _PIXEL_W*c0,oy + _PIXEL_H*r0). Fixed. - Unused
w_totalremoved; the windowed test now binds onlyh. Fixed. - Dask multi-band test added (
test_dask_multiband_2x1_mosaic_matches_eager) so per-block band handling is covered. Fixed.
Total: 7 tests pass locally. No deferrals, no dismissals.
# Conflicts: # xrspatial/geotiff/_reader.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2369. Sub-task of #2342.
Positive coverage for the simple VRT mosaic subset that the release contract lists as supported. The other VRT tests touch this subset while exercising codecs, dtypes, or chunk policies; nothing pinned the contract by itself.
Six tests, all driving the public
read_vrt:Each test asserts values, coords, and key attrs (crs, transform, nodata).
Backend coverage
Test plan
pytest xrspatial/geotiff/tests/test_vrt_simple_mosaic_2369.pypasses locally