Skip to content

Add stable-codec read/write/read round-trip release gate (#2360)#2365

Merged
brendancol merged 4 commits into
xarray-contrib:mainfrom
brendancol:issue-2360
May 24, 2026
Merged

Add stable-codec read/write/read round-trip release gate (#2360)#2365
brendancol merged 4 commits into
xarray-contrib:mainfrom
brendancol:issue-2360

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2360. PR 4 of 5 of epic #2341.

Summary

  • Adds xrspatial/geotiff/tests/test_release_gate_codec_round_trip_2341.py. It walks the cartesian product of stable codecs (none, deflate, lzw, zstd, packbits) and dtypes (int16, int32, float32, float64).
  • Each parametrized case runs a write / read / write / read cycle and asserts byte-exact pixels (NaN-aware for float) plus every canonical release attr (transform, crs, crs_wkt, nodata, masked_nodata, georef_status, raster_type). It also checks the on-disk TIFF Compression tag matches the codec requested.
  • Adds a row to docs/source/reference/release_gate_geotiff.rst under "Local GeoTIFF read and write" pointing at the new file.
  • A drift guard fails if the codec list in the test gets out of sync with stable codecs in SUPPORTED_FEATURES.

Test plan

  • pytest xrspatial/geotiff/tests/test_release_gate_codec_round_trip_2341.py (21 passed)

…rib#2360)

PR 4 of 5 of epic xarray-contrib#2341. Adds
xrspatial/geotiff/tests/test_release_gate_codec_round_trip_2341.py
covering the cartesian product of stable codecs (none, deflate, lzw,
zstd, packbits) and dtypes (int16, int32, float32, float64). Each
case runs a full write / read / write / read cycle and asserts
byte-exact pixels (NaN-aware for float) plus the canonical release
attrs (transform, crs, crs_wkt, nodata, masked_nodata,
georef_status, raster_type). Also checks the on-disk TIFF
Compression tag matches the codec.

Adds a row to docs/source/reference/release_gate_geotiff.rst under
"Local GeoTIFF read and write" citing the new file.

A drift guard fails the build if STABLE_CODECS gets out of sync with
the stable codec entries in SUPPORTED_FEATURES.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 24, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add stable-codec read/write/read round-trip release gate (#2360)

This is a test-only PR. No production code changed. The review focuses on test correctness, attr coverage, and the docs row.

Blockers (must fix before merge)

(none)

Suggestions (should fix, not blocking)

  • test_release_gate_codec_round_trip_2341.py:268-270 -- Pass 2 writes the baseline DataArray with nodata=baseline.attrs.get("nodata"), but the float-dtype reader stamps attrs["nodata"] as float NaN even when the dtype is float32/float64. That happens to round-trip fine with the writer, but if a future change makes the writer reject nodata=NaN (since NaN is already implicit for float), this test starts failing for reasons unrelated to the codec. Consider only passing the nodata= kwarg explicitly for the integer branch.
  • test_release_gate_codec_round_trip_2341.py:113-117 -- the int sentinel sits at arr[0, 0] and the regular ramp covers the rest of the array. With a 128*128 array, the ramp climbs to 16383, well inside int16 range, so no overflow. Fine for int16/int32, but if the dtype set grows to include int8 later, the ramp would wrap. Worth a comment that the ramp magnitude is dtype-aware (currently it relies on the smallest tested dtype having range >= 16384).

Nits (optional improvements)

  • test_release_gate_codec_round_trip_2341.py:194-196 -- pytest parametrize id order is [dtype-codec] (e.g. [int16-none]) because the outer decorator parametrizes codec first. The docs row and test docstring describe the order as codec * dtype. Either reorder the decorators or note the id order explicitly in the docstring.
  • test_release_gate_codec_round_trip_2341.py:283-285 -- _assert_pixels_equal reports a boolean mismatch. On failure it would be helpful to surface the first divergent pixel index so someone debugging a codec regression can locate the bad tile quickly. Optional.
  • docs/source/reference/release_gate_geotiff.rst:105 -- the new row says "On every stable codec ... crossed with every promised dtype ... a write / read / write / read cycle". Pretty long for a table cell; consider trimming the dtype list and pointing readers to the test file for the full matrix.

What looks good

  • The drift guard against SUPPORTED_FEATURES is the right shape: if a codec gets promoted into or out of stable and this file is not updated, the build breaks loudly with a clear message.
  • The unique tmp filenames (uuid suffix + dtype + codec) sidestep collisions across parallel pytest workers and parallel worktrees.
  • The on-disk TIFF Compression tag check (via parse_ifd) is a good answer to the "or whatever the canonical attr key is" wording in the issue body. No attrs['compression'] exists; pinning the TIFF tag pins the actual codec choice.
  • The two-pass write / read / write / read shape genuinely exercises the "round-trip" promise. Single-pass tests already exist in test_release_gate_codecs.py.
  • The mask_nodata=False branch for integer dtypes correctly keeps the sentinel as a real pixel so the byte-exact comparison is meaningful (otherwise the reader would promote int to float64 with NaN, defeating the dtype assertion).
  • Out-of-scope sections of the module docstring are explicit about what this gate does not cover.

Checklist

  • Algorithm matches reference -- N/A (test-only)
  • All implemented backends produce consistent results -- N/A (the test exercises the default eager numpy backend; backend parity is already covered by test_backend_parity_matrix.py)
  • NaN handling is correct (equal_nan=True for float, strict equality for int)
  • Edge cases covered (NaN sentinel at (0, 0); int sentinel at (0, 0); cartesian product across 5 codecs * 4 dtypes)
  • No premature materialization or copies
  • Benchmark not applicable (release-gate test)
  • README feature matrix not applicable (no new public function)
  • Docstrings present and accurate

- Only pass `nodata=` to `to_geotiff` on the integer branch; let the
  writer infer the float NaN sentinel from the array itself so the
  test does not lock the writer into accepting `nodata=NaN` if that
  ever becomes a no-op or a rejected redundancy.
- Add a comment to `_make_input` explaining the int-ramp magnitude
  assumption (128*128 ramp fits in int16; smaller-range dtypes would
  wrap and collide with the sentinel).
- Reorder parametrize decorators so test ids read `[codec-dtype]`,
  matching the docstring and the docs row.
- On pixel-mismatch failure, report the first divergent index plus
  the actual / expected scalar there so a codec regression points
  at a tile straight away.
- Trim the docs-row description in `release_gate_geotiff.rst`; the
  cited test file carries the full codec / dtype / attr matrix.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up PR Review: Add stable-codec read/write/read round-trip release gate (#2360)

After the c8667d4 commit addressing the first review.

Disposition of the first review

  • Suggestion 1 (drop nodata= kwarg on the float branch in both passes): fixed in c8667d4.
  • Suggestion 2 (comment the int-ramp magnitude assumption): fixed in c8667d4.
  • Nit 1 (parametrize id order): fixed in c8667d4. Ids now read [codec-dtype].
  • Nit 2 (first divergent pixel index in _assert_pixels_equal): fixed in c8667d4.
  • Nit 3 (trim the docs row): fixed in c8667d4.

Blockers

(none)

Suggestions

(none)

Nits

  • test_release_gate_codec_round_trip_2341.py:261-263 and :270 -- np.issubdtype(np.dtype(dtype_name), np.floating) is computed twice (once for mask_kwargs, once into is_float). Hoist is_float to the top of the test body and reuse it for mask_kwargs. Tiny win.

What looks good

  • Removing the nodata=NaN kwarg from the float path is the right call. The writer infers NaN from the array itself, so passing it again was redundant.
  • The first-divergent-index branch in _assert_pixels_equal is what you want on a CI failure -- "byte 14732 of tile 3" beats a generic boolean.
  • Docs row trimmed to a one-line description plus a single test path. Easier to scan in the table.

Checklist

  • Algorithm matches reference -- N/A (test-only)
  • All implemented backends produce consistent results -- N/A
  • NaN handling is correct
  • Edge cases covered
  • No premature materialization or copies
  • Benchmark not applicable
  • README feature matrix not applicable
  • Docstrings present and accurate

Address follow-up review nit: compute is_float once and reuse it for
both mask_kwargs and the per-pass nodata kwarg branches.
@brendancol
Copy link
Copy Markdown
Contributor Author

CI status: 4 pre-existing failures, unrelated to this PR

The pytest job (macOS / Ubuntu 3.14 full lane) reports 4 failures. All 4 are on test files this PR does not touch, and all 4 reproduce locally against the current main tip:

  • xrspatial/geotiff/tests/test_unsupported_features_2349.py::test_vrt_with_skewed_geotransform_rejected -- pytest.raises(match="rotated affine") no longer matches the writer's "non-zero rotation/shear terms" message.
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_unsupported_resample_alg_raises -- expects ValueError, the validator now raises VRTUnsupportedError.
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_srcrect_size_rejected -- regex SrcRect.*negative size no longer matches the SimpleSource validator message.
  • xrspatial/geotiff/tests/test_vrt_metadata_parity_2321.py::test_negative_dstrect_size_rejected -- regex DstRect.*negative size, same kind of message drift.

These look like a message-format / exception-type drift between the validators and the regex assertions in the tests. The PR's own test file (test_release_gate_codec_round_trip_2341.py) passes 21/21 locally, and the four red tests do not touch any code path this PR introduces.

Stopping CI iteration here per /rockout Step 12: real CI defect, out of scope for this PR, flagged for separate follow-up.

@brendancol brendancol merged commit 19fc4db into xarray-contrib:main May 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Release-gate: stable-codec read/write/read round-trip (PR 4 of 5 of epic #2341)

1 participant