Add stable-codec read/write/read round-trip release gate (#2360)#2365
Conversation
…rib#2360) PR 4 of 5 of epic xarray-contrib#2341. Adds xrspatial/geotiff/tests/test_release_gate_codec_round_trip_2341.py covering the cartesian product of stable codecs (none, deflate, lzw, zstd, packbits) and dtypes (int16, int32, float32, float64). Each case runs a full write / read / write / read cycle and asserts byte-exact pixels (NaN-aware for float) plus the canonical release attrs (transform, crs, crs_wkt, nodata, masked_nodata, georef_status, raster_type). Also checks the on-disk TIFF Compression tag matches the codec. Adds a row to docs/source/reference/release_gate_geotiff.rst under "Local GeoTIFF read and write" citing the new file. A drift guard fails the build if STABLE_CODECS gets out of sync with the stable codec entries in SUPPORTED_FEATURES.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Add stable-codec read/write/read round-trip release gate (#2360)
This is a test-only PR. No production code changed. The review focuses on test correctness, attr coverage, and the docs row.
Blockers (must fix before merge)
(none)
Suggestions (should fix, not blocking)
-
test_release_gate_codec_round_trip_2341.py:268-270-- Pass 2 writes the baseline DataArray withnodata=baseline.attrs.get("nodata"), but the float-dtype reader stampsattrs["nodata"]as float NaN even when the dtype is float32/float64. That happens to round-trip fine with the writer, but if a future change makes the writer rejectnodata=NaN(since NaN is already implicit for float), this test starts failing for reasons unrelated to the codec. Consider only passing thenodata=kwarg explicitly for the integer branch. -
test_release_gate_codec_round_trip_2341.py:113-117-- the int sentinel sits atarr[0, 0]and the regular ramp covers the rest of the array. With a 128*128 array, the ramp climbs to 16383, well insideint16range, so no overflow. Fine for int16/int32, but if the dtype set grows to includeint8later, the ramp would wrap. Worth a comment that the ramp magnitude is dtype-aware (currently it relies on the smallest tested dtype having range >= 16384).
Nits (optional improvements)
-
test_release_gate_codec_round_trip_2341.py:194-196-- pytest parametrize id order is[dtype-codec](e.g.[int16-none]) because the outer decorator parametrizescodecfirst. The docs row and test docstring describe the order ascodec * dtype. Either reorder the decorators or note the id order explicitly in the docstring. -
test_release_gate_codec_round_trip_2341.py:283-285--_assert_pixels_equalreports a boolean mismatch. On failure it would be helpful to surface the first divergent pixel index so someone debugging a codec regression can locate the bad tile quickly. Optional. -
docs/source/reference/release_gate_geotiff.rst:105-- the new row says "On every stable codec ... crossed with every promised dtype ... a write / read / write / read cycle". Pretty long for a table cell; consider trimming the dtype list and pointing readers to the test file for the full matrix.
What looks good
- The drift guard against
SUPPORTED_FEATURESis the right shape: if a codec gets promoted into or out ofstableand this file is not updated, the build breaks loudly with a clear message. - The unique tmp filenames (uuid suffix + dtype + codec) sidestep collisions across parallel pytest workers and parallel worktrees.
- The on-disk TIFF Compression tag check (via
parse_ifd) is a good answer to the "or whatever the canonical attr key is" wording in the issue body. Noattrs['compression']exists; pinning the TIFF tag pins the actual codec choice. - The two-pass write / read / write / read shape genuinely exercises the "round-trip" promise. Single-pass tests already exist in
test_release_gate_codecs.py. - The
mask_nodata=Falsebranch for integer dtypes correctly keeps the sentinel as a real pixel so the byte-exact comparison is meaningful (otherwise the reader would promote int to float64 with NaN, defeating the dtype assertion). - Out-of-scope sections of the module docstring are explicit about what this gate does not cover.
Checklist
- Algorithm matches reference -- N/A (test-only)
- All implemented backends produce consistent results -- N/A (the test exercises the default eager numpy backend; backend parity is already covered by
test_backend_parity_matrix.py) - NaN handling is correct (
equal_nan=Truefor float, strict equality for int) - Edge cases covered (NaN sentinel at (0, 0); int sentinel at (0, 0); cartesian product across 5 codecs * 4 dtypes)
- No premature materialization or copies
- Benchmark not applicable (release-gate test)
- README feature matrix not applicable (no new public function)
- Docstrings present and accurate
- Only pass `nodata=` to `to_geotiff` on the integer branch; let the writer infer the float NaN sentinel from the array itself so the test does not lock the writer into accepting `nodata=NaN` if that ever becomes a no-op or a rejected redundancy. - Add a comment to `_make_input` explaining the int-ramp magnitude assumption (128*128 ramp fits in int16; smaller-range dtypes would wrap and collide with the sentinel). - Reorder parametrize decorators so test ids read `[codec-dtype]`, matching the docstring and the docs row. - On pixel-mismatch failure, report the first divergent index plus the actual / expected scalar there so a codec regression points at a tile straight away. - Trim the docs-row description in `release_gate_geotiff.rst`; the cited test file carries the full codec / dtype / attr matrix.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up PR Review: Add stable-codec read/write/read round-trip release gate (#2360)
After the c8667d4 commit addressing the first review.
Disposition of the first review
- Suggestion 1 (drop
nodata=kwarg on the float branch in both passes): fixed in c8667d4. - Suggestion 2 (comment the int-ramp magnitude assumption): fixed in c8667d4.
- Nit 1 (parametrize id order): fixed in c8667d4. Ids now read
[codec-dtype]. - Nit 2 (first divergent pixel index in
_assert_pixels_equal): fixed in c8667d4. - Nit 3 (trim the docs row): fixed in c8667d4.
Blockers
(none)
Suggestions
(none)
Nits
-
test_release_gate_codec_round_trip_2341.py:261-263and:270--np.issubdtype(np.dtype(dtype_name), np.floating)is computed twice (once formask_kwargs, once intois_float). Hoistis_floatto the top of the test body and reuse it formask_kwargs. Tiny win.
What looks good
- Removing the
nodata=NaNkwarg from the float path is the right call. The writer infers NaN from the array itself, so passing it again was redundant. - The first-divergent-index branch in
_assert_pixels_equalis what you want on a CI failure -- "byte 14732 of tile 3" beats a generic boolean. - Docs row trimmed to a one-line description plus a single test path. Easier to scan in the table.
Checklist
- Algorithm matches reference -- N/A (test-only)
- All implemented backends produce consistent results -- N/A
- NaN handling is correct
- Edge cases covered
- No premature materialization or copies
- Benchmark not applicable
- README feature matrix not applicable
- Docstrings present and accurate
Address follow-up review nit: compute is_float once and reuse it for both mask_kwargs and the per-pass nodata kwarg branches.
CI status: 4 pre-existing failures, unrelated to this PRThe
These look like a message-format / exception-type drift between the validators and the regex assertions in the tests. The PR's own test file ( Stopping CI iteration here per |
Closes #2360. PR 4 of 5 of epic #2341.
Summary
xrspatial/geotiff/tests/test_release_gate_codec_round_trip_2341.py. It walks the cartesian product of stable codecs (none,deflate,lzw,zstd,packbits) and dtypes (int16,int32,float32,float64).transform,crs,crs_wkt,nodata,masked_nodata,georef_status,raster_type). It also checks the on-disk TIFF Compression tag matches the codec requested.docs/source/reference/release_gate_geotiff.rstunder "Local GeoTIFF read and write" pointing at the new file.stablecodecs inSUPPORTED_FEATURES.Test plan
pytest xrspatial/geotiff/tests/test_release_gate_codec_round_trip_2341.py(21 passed)