Skip to content

Honor max_pixels for VRT source reads#1803

Merged
brendancol merged 2 commits into
mainfrom
issue-1796
May 13, 2026
Merged

Honor max_pixels for VRT source reads#1803
brendancol merged 2 commits into
mainfrom
issue-1796

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1796.\n\nThreads max_pixels into each VRT source read and prevents safety-limit errors from being swallowed as missing-source holes.\n\nTested: pytest xrspatial/geotiff/tests/test_vrt_source_max_pixels_1796.py

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 14:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures read_vrt(..., max_pixels=...) enforces the caller’s pixel-budget on every VRT SimpleSource materialization, including the underlying source GeoTIFF window reads, and prevents “max_pixels safety limit” failures from being silently treated as missing-source holes (per #1796).

Changes:

  • Forward max_pixels into each VRT source read_to_array(...) call.
  • Re-raise max_pixels safety-limit ValueErrors instead of swallowing them in the lenient “skip missing/unreadable source” fallback path.
  • Add a regression test covering the bypass scenario (tiny VRT output with oversized SrcRect).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
xrspatial/geotiff/_vrt.py Threads max_pixels through VRT source reads and ensures safety-limit errors propagate rather than becoming holes.
xrspatial/geotiff/tests/test_vrt_source_max_pixels_1796.py Adds regression coverage that a small VRT cannot force an oversized source-window decode under a tight max_pixels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_vrt.py Outdated
Comment on lines +879 to +882
if (isinstance(e, ValueError)
and 'exceed' in str(e)
and 'safety limit' in str(e)):
raise
@brendancol brendancol merged commit 6cc96d7 into main May 13, 2026
1 of 11 checks passed
brendancol added a commit that referenced this pull request May 13, 2026
Resolves conflict in xrspatial/geotiff/__init__.py: keeps the
`_read_vrt_dask` dispatch hook from the PR branch. All other
geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804,
#1805, #1806) were already integrated into the working tree by the
prior 7329dd9 commit; this merge just records the parent so git
recognises the reconciliation.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
* Make VRT chunks read lazily (#1798)

* Cap lazy VRT dask task graphs

* Merge origin/main into issue-1798

Agent-Logs-Url: https://github.com/xarray-contrib/xarray-spatial/sessions/27f4131a-2907-4ca0-bf00-f303ab61f2e9

Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
brendancol added a commit that referenced this pull request May 13, 2026
* geotiff: cap VRT XML read size (closes #1815)

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
* geotiff: reject ambiguous 3D writer inputs (#1812)

to_geotiff and write_geotiff_gpu used to silently mishandle 3D
DataArrays whose leading dim was not in _BAND_DIM_NAMES = ('band',
'bands', 'channel'). The moveaxis that puts (band, y, x) into the
on-disk (y, x, band) layout was skipped, the writer kept the leading
axis as the spatial y axis, and the round-trip produced a TIFF with
silently swapped axes -- on read-back, out[:, :, 0].sum() !=
arr[0].sum().

Reject ambiguous 3D layouts at all three writer entry points (eager
to_geotiff, dask streaming, write_geotiff_gpu) via the shared
_validate_3d_writer_dims helper. Accepted layouts: (band, y, x) or
(y, x, band) with band-name aliases bands/channel and spatial-name
aliases lat/lon/latitude/longitude/row/col. Anything else raises
ValueError with an actionable message (rename the non-spatial dim
or transpose).

Surfaced by the 2026-05-13 metadata propagation sweep.

* geotiff: remove unused _AMBIGUOUS_3D_INPUTS test list (#1820 review)

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
…1817)

* geotiff: apply nodata mask against post-MinIsWhite sentinel (#1809)

MinIsWhite inversion was running before the sentinel-to-NaN nodata
mask on all four read backends. Because the inversion rewrites the
sentinel value (uint8 nodata=0 -> 255, float32 nodata=-9999 -> 9999),
the post-inversion equality check matched the wrong pixels:

* stored values that equalled the sentinel survived as iinfo.max -
  sentinel instead of NaN
* stored values that happened to equal iinfo.max - sentinel were
  incorrectly turned into NaN

Introduces _miniswhite_inverted_nodata() in _reader.py and stashes the
inverted sentinel on geo_info._mask_nodata. Every backend (eager numpy,
eager GPU, GPU stripped fallback, dask chunk closure) routes its mask
through the new field while attrs['nodata'] keeps the original sentinel
for write-side round-trip. The dask graph builder picks up the IFD
photometric off geo_info via _read_geo_info / _parse_cog_http_meta so
the closure nodata is inverted at graph-build time.

9 regression tests in test_miniswhite_nodata_1809.py cover uint8 with
nodata=0, uint16 with nodata=65535, float32 with nodata=-9999 across
numpy, dask, and GPU backends, plus no-collision and no-nodata controls.

Closes #1809

* geotiff: address Copilot review comments on #1817

Remove unused os/tempfile imports from the MinIsWhite regression test
and route the GPU tiled read's CPU-fallback branches through
GeoInfo._mask_nodata so a sparse-tile, planar=2 auto-fallback, or
final-fallback decode of a MinIsWhite raster masks against the
post-inversion sentinel rather than the original.

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 14, 2026
#1811)

* geotiff: forward missing_sources from open_geotiff to read_vrt (#1810)

read_vrt gained a missing_sources='warn'|'raise' policy kwarg in #1806
but the documented dispatcher open_geotiff did not expose it. Callers
who wanted strict failure on broken VRT sources had to call read_vrt
directly, defeating the dispatcher contract.

Same class of bug as #1561 / #1605 / #1685 / #1795: a backend kwarg
reachable on the routed-to function is unreachable through the
documented entry point. Fix mirrors the on_gpu_failure pattern that
open_geotiff already uses for GPU-only kwargs:

- Add missing_sources= to open_geotiff with a sentinel default so the
  dispatcher can tell "caller never set it" (skip forwarding, let the
  read_vrt default of 'warn' apply) from "caller set it" (forward
  verbatim).
- Reject missing_sources on non-VRT sources with a clear ValueError
  so callers learn the policy is being ignored instead of getting a
  silent drop.
- Document the kwarg in the open_geotiff docstring with the same
  values and semantics as read_vrt.

Regression tests in test_open_geotiff_missing_sources_1810.py cover
the signature, warn/raise/invalid policies through the dispatcher,
non-VRT rejection, and the no-kwarg backward-compatible path.

* sweep: update api-consistency state for geotiff (v5)

Record the v5 api-consistency sweep against geotiff. One MEDIUM
finding filed and fixed (#1810). Cross-sibling write-return-type
drift remains deferred at LOW.

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 14, 2026
* geotiff: close HTTP source on tile-fetch exception (closes #1816)

* geotiff: address Copilot review comments on #1819

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
brendancol added a commit that referenced this pull request May 14, 2026
* geotiff: build lazy dask graph for read_vrt(chunks=) (closes #1814)

Before this change, read_vrt(chunks=...) materialised the full VRT
mosaic on host RAM and then wrapped the resulting numpy array via
.chunk(). chunks= gave no memory protection, and gpu=True + chunks=
still assembled the entire mosaic on the CPU before moving to the
device.

The chunked path now parses the VRT XML once up front, builds one
dask.delayed per destination chunk window, and assembles them via
from_delayed + da.concatenate. Each task calls the existing VRT
internal reader with its own window= so only the sources intersecting
that window are decoded. With gpu=True each block calls cupy.asarray
before returning, so the dask array is dask-on-cupy from the start.

A task-count cap (50,000, matching read_geotiff_dask) refuses chunk
grids that would build a scheduler-busting graph and suggests a
larger chunks= value.

attrs['vrt_holes'] is not populated for chunked reads; the
GeoTIFFFallbackWarning still fires from each worker.

* geotiff: propagate vrt_holes and gate float64 promotion on chunked path (#1822 review)

Address Copilot review comments on PR #1822:

(c) The eager read_vrt populates attrs['vrt_holes'] as the machine-readable
partial-mosaic detection contract from #1734. The chunked path silently
omitted it. Populate the attr from a parse-time os.path.exists sweep over
every source referenced by the parsed VRT so callers switching from eager
to chunked keep the contract. The check is a static approximation that
catches the dominant missing-file case; codec-error holes still surface
as per-task GeoTIFFFallbackWarning.

(d) Document the static "any band declares nodata?" promotion check as
an explicit approximation of the eager path's runtime mask.any(). The
gate was already correct (no promotion when no band declares nodata) but
the surrounding comment did not call out the parse-time-vs-runtime
trade-off. Add a regression test pinning the no-nodata uint16 case to
the source dtype.

Defer the duplicated eager logic and N+1 VRT-XML re-parse via tracking
issue #1825; TODO markers reference that issue at both call sites.

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.

* geotiff: clear _mmap_cache before unlink in #1814 hole test

The Windows 3.14 CI job failed with WinError 32 when the test tried
to unlink a tile referenced by the VRT: write_vrt() opens each tile
via _FileSource to read its header, and _FileSource.close() only
decrements the refcount in the shared _mmap_cache -- the mmap and
file handle remain idle in the cache. POSIX allows unlink of an
mmap'd file; Windows does not.

Call _mmap_cache.clear() (the cache's existing test helper, which
drops idle entries) immediately before os.unlink. Linux/macOS are
unaffected.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
@brendancol brendancol deleted the issue-1796 branch May 15, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VRT source reads do not honor max_pixels safety limit

2 participants