Reject signed-integer MinIsWhite TIFFs on read and write (#2278)#2281
Merged
Conversation
The reader (``_decode._apply_photometric_miniswhite``) and writer (``_encode._apply_photometric_miniswhite_invert``) used to silently pass signed-int pixels through unchanged when Photometric=0. The result round-tripped inside xrspatial but produced files whose pixel values disagreed with the on-disk Photometric tag against every other TIFF consumer (GDAL, libtiff, ImageMagick). Both paths now raise ``NotImplementedError`` with a message that lists SampleFormat / BitsPerSample / Photometric so callers can diagnose the rejection without digging into the spec. The pre-write rejection fires before any bytes hit disk, so a failed write leaves no partial file behind. Flipped the existing ``test_miniswhite_int_passthrough`` to assert the new error and added a dedicated test module that covers all four signed widths, the read path (via a forged on-disk file), and non-regression for the unsigned/float MinIsWhite round-trip paths.
brendancol
commented
May 22, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Reject signed-integer MinIsWhite TIFFs on read and write (#2278)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
- Pure-GPU decode path still silently passes signed-int MinIsWhite through (
xrspatial/geotiff/_backends/gpu.py:830-847). That block runs whenifd.photometric == 0 and samples == 1 and not arr_was_cpu_decoded, and only handlesgpu_dtype.kind in ('u', 'f'). Signed-int dtypes hit neither branch, so the buffer comes back un-inverted with Photometric=0 still in the IFD attrs -- the same silent-passthrough this PR fixes on the CPU side. Hard to hit in practice (no signed-int MinIsWhite fixtures in the golden corpus, and the GDS direct-decode path bails onphotometric == 0at line 1058-1059), but it's a backend-parity gap: a CPU write followed byopen_geotiff(..., gpu=True)would diverge from the CPU read after this PR lands. Either raise the sameNotImplementedErrorhere, or route signed-int MinIsWhite reads through the CPU fallback.
Nits (optional improvements)
-
xrspatial/geotiff/tests/test_signed_miniswhite_rejected_2278.py:96:f'i{expected_bps}_msw_2278_{tmp_path.name}.tif'interpolatestmp_path.name, buttmp_pathis already per-test and per-invocation, so the extra suffix is redundant.f'i{expected_bps}_msw_2278.tif'works. -
_patch_tag_valuein the new test file duplicates the IFD walk inxrspatial/geotiff/tests/_tiff_surgery.py:patch_byte_counts. One caller right now so the duplication is cheap, but if a third surgery test ever shows up the helper is worth promoting.
What looks good
- Error message carries SampleFormat, BitsPerSample, Photometric, and dtype on both paths.
- Pre-write rejection runs before any bytes hit disk; there's a dedicated test pinning that no partial file is left behind.
- Read coverage spans all three CPU read paths (local file via
_reader._read_to_array, HTTP COG eager via_cog_http, HTTP COG dask via_backends/dask.py). They all funnel through the same_apply_photometric_miniswhitehelper, so changing one place covers all three. - Non-regression coverage for the unsigned (uint8) and float (float32) round-trips is in the new test module, not just relying on the #1836 file.
- Public
to_geotiffdocstring updated with the limitation and the workaround. - The forged-file read test asserts
SampleFormat=2post-write before flipping Photometric, so if the writer's SampleFormat default ever shifts the test fails loud instead of forging a file that no longer reproduces the bug.
Checklist
- Algorithm matches the issue's chosen direction (Option A: reject)
- All CPU read/write backends produce consistent behaviour (raise)
- Backend parity gap on the pure-GPU decode path (see Suggestions)
- NaN handling unchanged; only a new dtype-branch rejection
- Edge cases covered (int8/16/32/64 widths, type bounds via
info.min/info.max) - No premature materialization or unnecessary copies
- Benchmark not needed (pure error path)
- README feature matrix update not needed (no public function added)
- Docstrings present and accurate on both helpers and the
to_geotiffkwarg
Follow-up to the initial PR review: - Pure-GPU decode path at ``_backends/gpu.py:830`` was only handling ``'u'`` and ``'f'`` dtypes in the MinIsWhite block, so signed integers would silently pass through un-inverted on the GPU side even though the CPU side now raises. Added the matching ``NotImplementedError`` branch so a CPU write and a ``open_geotiff(..., gpu=True)`` round-trip stay consistent. - Added a GPU-gated test that forges a signed-int MinIsWhite TIFF and asserts ``open_geotiff(path, gpu=True)`` raises with the same message shape as the CPU path. Skips cleanly when cupy / CUDA are not available. - Dropped the redundant ``tmp_path.name`` suffix from the parameterised write-test filename; pytest's ``tmp_path`` is already unique per test invocation.
brendancol
commented
May 22, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review (follow-up): Reject signed-integer MinIsWhite TIFFs on read and write (#2278)
Re-reviewed after the follow-up commit 0df24a90.
Blockers
None.
Suggestions
The previous round's GPU parity gap is closed at xrspatial/geotiff/_backends/gpu.py:848-866 with a matching NotImplementedError carrying the same SampleFormat / BitsPerSample / Photometric values. The new test_read_signed_miniswhite_rejected_on_gpu_path_2278 exercises that path against a forged signed-int MinIsWhite TIFF and skips cleanly without cupy / CUDA. Nothing else to add.
Nits
The previous round's helper-duplication nit (_patch_tag_value vs _tiff_surgery.py:patch_byte_counts) was deferred on purpose because there's still only one caller. Worth promoting if a third surgery test ever shows up.
What looks good
- GPU rejection message has the same shape as the CPU one, so callers see a uniform diagnostic regardless of which backend
open_geotiff(...)dispatched to. - The GPU test reuses
_forge_signed_miniswhite_tif, so the on-disk file layout is identical between the CPU and GPU read tests. - Skip handling on the GPU test follows the established pattern in
test_golden_corpus_gpu_1930.py:pytest.importorskip("cupy")plus agetDeviceCount()guard.
Checklist
- All read paths (eager local, eager HTTP COG, dask HTTP COG, GPU pure-decode) raise consistently
- All write paths (eager) raise consistently; dask streaming continues to reject
miniswhitevia the existing #1836 guard - Tests cover all dtype widths (int8/16/32/64) and all rejection sites (write, CPU read, GPU read)
- No regressions in unsigned or float MinIsWhite round-trips
- Full
xrspatial/geotiff/tests/run: 5080 passed, 68 skipped
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #2278.
The reader and writer used to silently let signed-integer pixels pass through unchanged when
PhotometricInterpretation = MinIsWhite. That round-tripped fine inside xrspatial but produced files whose pixel values disagreed with the on-disk Photometric tag against every other TIFF consumer (GDAL, libtiff, ImageMagick) -- basically a guaranteed inverted display anywhere else.Both paths now raise
NotImplementedErrorwith the SampleFormat, BitsPerSample, and Photometric values in the message so callers can diagnose without digging into the spec.I went with Option A (reject) rather than Option B (implement signed inversion). The rationale is in the issue: failing closed is what the rest of this codebase already does for ambiguous TIFF metadata, signed MinIsWhite is rare in real GIS data anyway, and silent passthrough was the worst-of-both-worlds outcome -- looked fine locally, broken everywhere else.
Test plan
xrspatial/geotiff/tests/test_signed_miniswhite_rejected_2278.py: parameterised write rejection acrossint8/int16/int32/int64, a no-partial-file check, a read-path test that forges a signed MinIsWhite file by writing a normal int16 MinIsBlack TIFF and flipping the Photometric tag to 0 in place, and non-regression coverage that the unsigned and float MinIsWhite round-trips still work.test_miniswhite_int_passthroughintest_miniswhite_writer_roundtrip_1836.pyto assert the new error (renamedtest_miniswhite_int_rejected_at_write_2278).xrspatial/geotiff/tests/run locally: 5079 passed, 68 skipped.Backend coverage
Bug fix in the shared CPU read/write photometric helpers. The reader change covers both the local-file and HTTP COG paths (both go through
_reader._apply_photometric_miniswhite). The writer change covers the eager write path. The dask streaming writer already rejectsminiswhiteoutright for an unrelated reason (#1836), so signed-int + dask is covered transitively.