Skip to content

Reject signed-integer MinIsWhite TIFFs on read and write (#2278)#2281

Merged
brendancol merged 2 commits into
mainfrom
issue-2278
May 22, 2026
Merged

Reject signed-integer MinIsWhite TIFFs on read and write (#2278)#2281
brendancol merged 2 commits into
mainfrom
issue-2278

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #2278.

The reader and writer used to silently let signed-integer pixels pass through unchanged when PhotometricInterpretation = MinIsWhite. That round-tripped fine inside xrspatial but produced files whose pixel values disagreed with the on-disk Photometric tag against every other TIFF consumer (GDAL, libtiff, ImageMagick) -- basically a guaranteed inverted display anywhere else.

Both paths now raise NotImplementedError with the SampleFormat, BitsPerSample, and Photometric values in the message so callers can diagnose without digging into the spec.

I went with Option A (reject) rather than Option B (implement signed inversion). The rationale is in the issue: failing closed is what the rest of this codebase already does for ambiguous TIFF metadata, signed MinIsWhite is rare in real GIS data anyway, and silent passthrough was the worst-of-both-worlds outcome -- looked fine locally, broken everywhere else.

Test plan

  • New xrspatial/geotiff/tests/test_signed_miniswhite_rejected_2278.py: parameterised write rejection across int8/int16/int32/int64, a no-partial-file check, a read-path test that forges a signed MinIsWhite file by writing a normal int16 MinIsBlack TIFF and flipping the Photometric tag to 0 in place, and non-regression coverage that the unsigned and float MinIsWhite round-trips still work.
  • Flipped the existing test_miniswhite_int_passthrough in test_miniswhite_writer_roundtrip_1836.py to assert the new error (renamed test_miniswhite_int_rejected_at_write_2278).
  • Full xrspatial/geotiff/tests/ run locally: 5079 passed, 68 skipped.

Backend coverage

Bug fix in the shared CPU read/write photometric helpers. The reader change covers both the local-file and HTTP COG paths (both go through _reader._apply_photometric_miniswhite). The writer change covers the eager write path. The dask streaming writer already rejects miniswhite outright for an unrelated reason (#1836), so signed-int + dask is covered transitively.

The reader (``_decode._apply_photometric_miniswhite``) and writer
(``_encode._apply_photometric_miniswhite_invert``) used to silently
pass signed-int pixels through unchanged when Photometric=0. The
result round-tripped inside xrspatial but produced files whose pixel
values disagreed with the on-disk Photometric tag against every other
TIFF consumer (GDAL, libtiff, ImageMagick).

Both paths now raise ``NotImplementedError`` with a message that lists
SampleFormat / BitsPerSample / Photometric so callers can diagnose
the rejection without digging into the spec. The pre-write rejection
fires before any bytes hit disk, so a failed write leaves no partial
file behind.

Flipped the existing ``test_miniswhite_int_passthrough`` to assert
the new error and added a dedicated test module that covers all four
signed widths, the read path (via a forged on-disk file), and
non-regression for the unsigned/float MinIsWhite round-trip paths.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 22, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Reject signed-integer MinIsWhite TIFFs on read and write (#2278)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • Pure-GPU decode path still silently passes signed-int MinIsWhite through (xrspatial/geotiff/_backends/gpu.py:830-847). That block runs when ifd.photometric == 0 and samples == 1 and not arr_was_cpu_decoded, and only handles gpu_dtype.kind in ('u', 'f'). Signed-int dtypes hit neither branch, so the buffer comes back un-inverted with Photometric=0 still in the IFD attrs -- the same silent-passthrough this PR fixes on the CPU side. Hard to hit in practice (no signed-int MinIsWhite fixtures in the golden corpus, and the GDS direct-decode path bails on photometric == 0 at line 1058-1059), but it's a backend-parity gap: a CPU write followed by open_geotiff(..., gpu=True) would diverge from the CPU read after this PR lands. Either raise the same NotImplementedError here, or route signed-int MinIsWhite reads through the CPU fallback.

Nits (optional improvements)

  • xrspatial/geotiff/tests/test_signed_miniswhite_rejected_2278.py:96: f'i{expected_bps}_msw_2278_{tmp_path.name}.tif' interpolates tmp_path.name, but tmp_path is already per-test and per-invocation, so the extra suffix is redundant. f'i{expected_bps}_msw_2278.tif' works.
  • _patch_tag_value in the new test file duplicates the IFD walk in xrspatial/geotiff/tests/_tiff_surgery.py:patch_byte_counts. One caller right now so the duplication is cheap, but if a third surgery test ever shows up the helper is worth promoting.

What looks good

  • Error message carries SampleFormat, BitsPerSample, Photometric, and dtype on both paths.
  • Pre-write rejection runs before any bytes hit disk; there's a dedicated test pinning that no partial file is left behind.
  • Read coverage spans all three CPU read paths (local file via _reader._read_to_array, HTTP COG eager via _cog_http, HTTP COG dask via _backends/dask.py). They all funnel through the same _apply_photometric_miniswhite helper, so changing one place covers all three.
  • Non-regression coverage for the unsigned (uint8) and float (float32) round-trips is in the new test module, not just relying on the #1836 file.
  • Public to_geotiff docstring updated with the limitation and the workaround.
  • The forged-file read test asserts SampleFormat=2 post-write before flipping Photometric, so if the writer's SampleFormat default ever shifts the test fails loud instead of forging a file that no longer reproduces the bug.

Checklist

  • Algorithm matches the issue's chosen direction (Option A: reject)
  • All CPU read/write backends produce consistent behaviour (raise)
  • Backend parity gap on the pure-GPU decode path (see Suggestions)
  • NaN handling unchanged; only a new dtype-branch rejection
  • Edge cases covered (int8/16/32/64 widths, type bounds via info.min/info.max)
  • No premature materialization or unnecessary copies
  • Benchmark not needed (pure error path)
  • README feature matrix update not needed (no public function added)
  • Docstrings present and accurate on both helpers and the to_geotiff kwarg

Follow-up to the initial PR review:

- Pure-GPU decode path at ``_backends/gpu.py:830`` was only handling
  ``'u'`` and ``'f'`` dtypes in the MinIsWhite block, so signed
  integers would silently pass through un-inverted on the GPU side
  even though the CPU side now raises. Added the matching
  ``NotImplementedError`` branch so a CPU write and a
  ``open_geotiff(..., gpu=True)`` round-trip stay consistent.
- Added a GPU-gated test that forges a signed-int MinIsWhite TIFF
  and asserts ``open_geotiff(path, gpu=True)`` raises with the same
  message shape as the CPU path. Skips cleanly when cupy / CUDA
  are not available.
- Dropped the redundant ``tmp_path.name`` suffix from the
  parameterised write-test filename; pytest's ``tmp_path`` is
  already unique per test invocation.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up): Reject signed-integer MinIsWhite TIFFs on read and write (#2278)

Re-reviewed after the follow-up commit 0df24a90.

Blockers

None.

Suggestions

The previous round's GPU parity gap is closed at xrspatial/geotiff/_backends/gpu.py:848-866 with a matching NotImplementedError carrying the same SampleFormat / BitsPerSample / Photometric values. The new test_read_signed_miniswhite_rejected_on_gpu_path_2278 exercises that path against a forged signed-int MinIsWhite TIFF and skips cleanly without cupy / CUDA. Nothing else to add.

Nits

The previous round's helper-duplication nit (_patch_tag_value vs _tiff_surgery.py:patch_byte_counts) was deferred on purpose because there's still only one caller. Worth promoting if a third surgery test ever shows up.

What looks good

  • GPU rejection message has the same shape as the CPU one, so callers see a uniform diagnostic regardless of which backend open_geotiff(...) dispatched to.
  • The GPU test reuses _forge_signed_miniswhite_tif, so the on-disk file layout is identical between the CPU and GPU read tests.
  • Skip handling on the GPU test follows the established pattern in test_golden_corpus_gpu_1930.py: pytest.importorskip("cupy") plus a getDeviceCount() guard.

Checklist

  • All read paths (eager local, eager HTTP COG, dask HTTP COG, GPU pure-decode) raise consistently
  • All write paths (eager) raise consistently; dask streaming continues to reject miniswhite via the existing #1836 guard
  • Tests cover all dtype widths (int8/16/32/64) and all rejection sites (write, CPU read, GPU read)
  • No regressions in unsigned or float MinIsWhite round-trips
  • Full xrspatial/geotiff/tests/ run: 5080 passed, 68 skipped

@brendancol brendancol merged commit c1cd5b0 into main May 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Signed-integer MinIsWhite TIFFs round-trip with semantically wrong photometric meaning

1 participant