Release-gate: overview/sidecar metadata survival (PR 3 of 5 of epic #2341)#2363
Conversation
Epic #2341 flagged "overview reads lose CRS/transform/nodata metadata" as a release-blocker risk. The existing overview tests cover pixel correctness and specific nodata behaviour but do not pin the full metadata contract across overview levels, and they do not cover parity between internal COG overviews and external .ovr sidecars. The new test file constructs both fixture shapes in-test and asserts, through eager and dask reads, that crs, crs_wkt, georef_status, raster_type, nodata, and masked_nodata agree across base + every overview level, and that transform scales pixel size by the level factor while keeping the origin fixed. A row in docs/source/reference/release_gate_geotiff.rst under "Sidecar and overview interactions" cites the new file. Closes #2359
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Release-gate: overview/sidecar metadata survival (PR 3 of 5 of epic #2341)
Test-only PR. 12 parametrized tests, in-test fixture construction, eager + dask read paths, two source shapes (internal COG, external .ovr sidecar) plus a cross-source parity test. All 12 tests pass locally.
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:110-122--_write_internal_overview_cogdoes not assert the overview IFDs were actually written. A regression whereto_geotiff(overview_levels=[2, 4], cog=True)silently writes a no-overview COG would only surface through the downstream shape-mismatch test. A one-linerasterio.open(path).overviews(1) == [2, 4]post-write check would localise the failure to the writer instead of the reader. -
xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:61--_OVERVIEW_FACTORS = [2, 4]is a module-level mutable list. None of the tests mutate it, but using a tuple(2, 4)removes the foot-gun for future edits and matches the immutable shape of the other constants (_BASE_TRANSFORMis already a tuple).
Nits (optional improvements)
-
xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:45--dask_array = pytest.importorskip("dask.array")binds a name that is never used. Theimportorskipgate works whether the result is bound or not; callingpytest.importorskip("dask.array")on its own removes the unused-name lint signal. -
xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py:148--from rasterio.enums import Resamplinglives inside_write_external_sidecarand runs on every sidecar build. Top-of-file would be marginally faster and consistent with therasterio = pytest.importorskip(...)line at module scope. Cost is one extra import at module-load.
What looks good
- In-test fixture construction for both COG and sidecar shapes, matching the issue spec's "build one in-test if not already present" guidance.
- Inlined assertion helpers in this file only; no shared cross-PR helper module per the epic constraint.
- Unique tmp paths use
uuid.uuid4().hexplus the issue number and a test-specific label. - Both eager and dask read paths are parametrized; the cross-source parity test catches drift between the two pyramid paths.
- The contract is articulated explicitly in
_EQUAL_KEYS; the docs row inrelease_gate_geotiff.rstmatches the assertion set.
Checklist
- Algorithm matches reference/paper (N/A -- test-only)
- All implemented backends produce consistent results (eager and dask read paths covered)
- NaN handling is correct (the constructed fixture writes one NaN cell, exercising the masked_nodata lifecycle)
- Edge cases are covered by tests (presence/absence symmetry on
raster_type; absent-on-base behaviour handled) - Dask chunk boundaries handled correctly (
chunks=8straddles overview levels) - No premature materialization or unnecessary copies
- Benchmark exists or is not needed (release-gate test; no perf-sensitive surface)
- README feature matrix updated (if applicable) -- N/A
- Docstrings present and accurate (module docstring + per-helper docstrings)
- Add post-write sanity that to_geotiff actually emitted the requested overview IFDs so a writer regression localises near the writer call rather than as a downstream shape mismatch. - Switch _OVERVIEW_FACTORS from a list to a tuple to match the other module-level constants and remove a foot-gun for future edits. - Drop the unused dask_array binding from pytest.importorskip; the gate still fires whether the result is bound or not. - Hoist `from rasterio.enums import Resampling` to module scope so it is not re-imported on every sidecar build.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (post fixes for #2363)
All four findings from the previous review are addressed in 0b1afab:
- Suggestion 1 (writer sanity check): fixed.
_write_internal_overview_cognow reopens the file withrasterio.openand assertsds.overviews(1) == [2, 4]post-write. - Suggestion 2 (
_OVERVIEW_FACTORSto tuple): fixed. The list-call sites that need a list (to_geotiffandds.build_overviews) wrap withlist(_OVERVIEW_FACTORS)at the call site. - Nit 1 (unused
dask_arraybinding): fixed. The line is nowpytest.importorskip("dask.array")with no LHS. - Nit 2 (in-function
Resamplingimport): fixed. Hoisted to module scope under the# noqa: E402next to the other post-importorskip imports.
12 tests still pass locally:
xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py ........ 12 passed in 0.28s
No new findings on this pass.
CI statusThe fast-lane workflow on
All four pre-exist on The other workflows on this PR pass:
The 12 tests added by this PR pass locally (eager + dask, COG + sidecar, cross-source parity). Treating the VRT failures as out of scope for this PR -- they are a separate |
Closes #2359.
Summary
xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.pywith 12 tests covering the metadata-survival contract on overview reads.to_geotiff(cog=True, overview_levels=[2, 4], nodata=...)and a tiled base + external.ovrsidecar viarasteriowithTIFF_USE_OVR=YES.open_geotiff) and dask (read_geotiff_dask) read paths. Per-levelattrsmust agree oncrs,crs_wkt,georef_status,raster_type,nodata,masked_nodata.transformmust scale pixel size by the level factor with the origin held.docs/source/reference/release_gate_geotiff.rstunder "Sidecar and overview interactions" points at the new file.Backend coverage
Read paths: numpy eager, dask+numpy. CPU-only because the metadata contract lives on the
attrsdict assembled by the reader, which is dtype-and-device agnostic; GPU pixel reads inherit the same attrs through the existing GPU sidecar tests (test_gpu_sidecar_georef_parity_2324.py).Test plan
pytest xrspatial/geotiff/tests/test_release_gate_overview_sidecar_metadata_2341.py -v-- 12 passed locally