Skip to content

Add release-gate windowed-read coords + shifted-transform parity test (#2358)#2364

Merged
brendancol merged 4 commits into
mainfrom
issue-2358
May 24, 2026
Merged

Add release-gate windowed-read coords + shifted-transform parity test (#2358)#2364
brendancol merged 4 commits into
mainfrom
issue-2358

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented May 24, 2026

Closes #2358. PR 2 of 5 for epic #2341.

Summary

  • New file xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py locks the windowed-read contract: shape equals the window's (height, width), coords['y'/'x'] are a bit-exact slice of the unwindowed coords, attrs['transform'] equals T_full * Affine.translation(col_off, row_off) by exact tuple equality, and the non-transform canonical release attrs match the unwindowed read.
  • Corpus is three files: int16 stripped with an integer nodata sentinel, float32 tiled with a NaN sentinel, and float32 stripped without a declared sentinel.
  • Both eager (open_geotiff(window=...)) and dask (read_geotiff_dask(window=..., chunks=32)) paths are exercised. masked_nodata is data-dependent (true iff sentinel-to-NaN actually ran), so it is asserted structurally (presence + bool type) instead of by value.
  • Adds the matching row under "Local GeoTIFF read and write" in docs/source/reference/release_gate_geotiff.rst.

Backend coverage

  • numpy (eager open_geotiff)
  • dask+numpy (read_geotiff_dask)

GPU paths are out of scope for this row and are covered by test_backend_pixel_parity_matrix_1813.py.

Test plan

  • pytest xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py -- 24 passed locally (3 files x 2 readers x 4 assertions).
  • CI green on the PR.

Notes

  • The transform check computes T_full * Affine.translation(col_off, row_off) by hand. It does not call rasterio.windows.transform, so the test pins the spec rather than whatever rasterio happens to compute.
  • Temp filenames include the issue tag and a per-test uuid so sibling worktrees and xdist workers do not collide on the same path.

…#2358)

Adds xrspatial/geotiff/tests/test_release_gate_windowed_reads_2341.py
to lock the contract that a windowed read returns the correct shape,
coords that slice the unwindowed coords bit-exact, an attrs['transform']
equal to T_full * Affine.translation(col_off, row_off) with no float
drift, and unchanged canonical non-transform release attrs. Covers
eager and dask read paths across a representative corpus (int16 with
nodata, float32 tiled with NaN nodata, float32 stripped without
nodata).

Adds the matching row in docs/source/reference/release_gate_geotiff.rst.

Part of epic #2341, PR 2 of 5.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 24, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add release-gate windowed-read coords + shifted-transform parity test (#2358)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • _assert_canonical_attrs_unchanged uses isinstance(full_val, float) and np.isnan(full_val) to detect a NaN nodata sentinel (test file lines 325-329). The reader currently normalises the read-back nodata to a Python float, so this works today, but if any backend ever returns the sentinel as np.float32 (which is what the input fixture passes to to_geotiff) the isinstance branch silently falls through to win_val == full_val and NaN != NaN flips the test from "checked NaN equality" to "always fails". A more durable form is something like

    try:
        full_is_nan = bool(np.isnan(full_val))
    except (TypeError, ValueError):
        full_is_nan = False

    which handles both Python and numpy scalar NaNs without binding to a specific dtype.

  • The dask backend uses chunks=32 over a 256x256 raster with a window starting at (row_off=32, col_off=64). Both window offsets are exact multiples of 32, so the window aligns to chunk boundaries and the test does not exercise the case where the window splits a chunk. A second parametrize case at e.g. chunks=20 or a window offset of (33, 65) would cover the misaligned path without much extra runtime.

Nits (optional improvements)

  • _PIXEL_HEIGHT = -25.0, _ORIGIN_Y = 4001987.25 -- the values are well chosen to defeat integer-only window math, but a one-line comment on _ORIGIN_X = 500123.5 explaining "fractional origin so a windowed reader that drops the fractional part fails the equality assertion" would make the choice self-documenting.

  • The corpus uses three files: int16-deflate-stripped, float32-deflate-tiled, float32-none-stripped. A MinIsWhite cell is on the epic's representative-files list (see test_backend_pixel_parity_matrix_1813.py::_TIFF_FIXTURES). Out of scope for this PR if you want to keep the diff small, but consider adding it in a follow-up so this gate covers every dtype/layout the eager/dask matrix covers.

What looks good

  • Algebra: _expected_window_transform keeps the b and d skew terms so the formula generalises if a rotated-grid fixture is added later. The implementation matches T_full * Affine.translation(col_off, row_off) exactly for axis-aligned grids.
  • No rasterio oracle: the transform check pins the spec directly, as the issue asked.
  • Tuple equality (not approx) on the transform forces the reader to thread the algebraic identity all the way through, instead of re-deriving the origin from the y/x coord arrays where float rounding could creep in.
  • masked_nodata is correctly identified as data-dependent and asserted structurally (presence + bool type), with the rationale in a comment pointing at _attrs.py.
  • Temp filenames include the issue tag + a uuid so sibling rockout worktrees and xdist workers cannot collide.
  • @pytest.mark.release_gate is registered in setup.cfg (line 115).
  • The matching row was added to docs/source/reference/release_gate_geotiff.rst under "Local GeoTIFF read and write" with a clear one-line acceptance.

Checklist

  • Algorithm matches reference/paper (algebraic identity, not rasterio)
  • All implemented backends produce consistent results (eager + dask both pass)
  • NaN handling is correct (works today; see Suggestion 1 for a durability nit)
  • Edge cases covered (int sentinel, float NaN sentinel, no sentinel)
  • Dask chunk boundaries handled (32-aligned today; see Suggestion 2 for the misaligned case)
  • No premature materialization (no .compute(); coords are already eager on xarray)
  • Benchmark not needed (test-only PR)
  • README feature matrix update not needed (no new public function)
  • Docstrings present and accurate

Applies the four findings from the self-review on PR #2364:

- Robust NaN compare on the canonical-attrs check: drop the
  ``isinstance(x, float)`` gate in favour of a try/except around
  ``np.isnan(x)`` so a future backend that returns a numpy-scalar
  nodata cannot silently fall into the ``==`` branch where NaN != NaN
  flips the test from "checked equal" to "always fails".
- Add a chunk-misaligned window case ``(33, 65, 95, 191)`` alongside
  the chunk-aligned ``(32, 64, 96, 192)`` window so the dask reader
  has to split chunks instead of starting on a chunk boundary.
- Extend the corpus with a uint8 MinIsWhite stripped fixture so this
  release-gate row covers the same representative layout as
  ``test_backend_pixel_parity_matrix_1813.py``. Gated on tifffile
  availability via the same skip marker pattern. Because the
  MinIsWhite fixture is written without GeoTIFF tags, the
  transform-shift assertion handles the no-georef case by requiring
  that neither read carries a ``transform``; an invented transform on
  the windowed side would be the silent-wrongness this gate exists to
  catch.
- Document the ``_ORIGIN_X = 500123.5`` choice with a comment on why
  the fractional value matters for the exact-tuple equality.

Test count goes from 24 (3 files x 2 readers x 4 assertions) to 64
(4 files x 2 readers x 2 windows x 4 assertions); local run is still
under one second.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up): Add release-gate windowed-read coords + shifted-transform parity test (#2358)

Second pass after the round-1 follow-up commit (2d53ea29).

Blockers

None.

Suggestions

None.

Nits (optional)

  • The module header docstring (lines 8-12) still names three corpus cells (integer + nodata, float + NaN nodata, float no nodata). The follow-up commit added a fourth (uint8 MinIsWhite stripped). Update the docstring so the prose matches _CORPUS. One-line fix.

What looks good

  • All four findings from the first review are addressed.
  • NaN compare is now backend-agnostic via try: np.isnan(x); a future numpy-scalar nodata cannot silently slip into the == branch.
  • The chunk-misaligned window (33, 65, 95, 191) exercises the dask reader's chunk-split path; the aligned window keeps the boundary-aligned case covered.
  • MinIsWhite cell added with the same tifffile-gated marker pattern used by test_backend_pixel_parity_matrix_1813.py, and _assert_transform_shifted now handles the legitimate no-georef case (no transform on either side) without papering over a real attr-drop.
  • The _ORIGIN_X = 500123.5 fractional value is now self-documenting.

Checklist

  • Algorithm matches the algebraic spec (no rasterio oracle)
  • Eager and dask paths covered (numpy + dask+numpy)
  • NaN handling is robust (backend-agnostic via np.isnan)
  • Edge cases covered (int sentinel, float NaN, no sentinel, MinIsWhite no-georef)
  • Dask chunk boundaries handled (aligned + misaligned cases)
  • No premature materialization
  • Benchmark not needed (test-only)
  • README feature matrix update not needed (no new public function)
  • Docstrings present (header prose has a minor stale-count nit; see above)

@brendancol
Copy link
Copy Markdown
Contributor Author

CI status note

The Python 3.14 pytest jobs are red on this PR, but the four failing tests are pre-existing failures on main:

  • xrspatial/geotiff/tests/test_unsupported_features_2349.py::test_vrt_with_skewed_geotransform_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_unsupported_resample_alg_raises
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_srcrect_size_rejected
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_dstrect_size_rejected

All four are regex-match failures against RotatedTransformError / VRTUnsupportedError messages whose wording was tightened on main. The same four tests fail identically on main's 3.14 CI (run 26363918141, ubuntu-latest 3.14), and none of them touch any file this PR modifies (git diff origin/main HEAD on those paths is empty).

The Python 3.12 jobs and the new test file added by this PR pass on every backend matrix cell (eager + dask x 4 corpus files x 2 windows x 4 assertions = 64 cases; MinIsWhite cells skip cleanly when tifffile is not installed).

Flagging the red checks as out of scope for this PR -- the regex tightening lives in sibling tests and should land on its own.

@brendancol brendancol merged commit 896fba4 into main May 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release-gate: windowed-read coords + shifted transform parity (PR 2 of 5 of epic #2341)

1 participant