Skip to content

Commit bbec25c

Browse files
committed
geotiff: GPU writer overview loop uses cupy.putmask for in-place NaN rewrite (#1948)
The COG overview loop inside `write_geotiff_gpu` used to allocate a fresh `current.copy()` before rewriting NaN cells back to the nodata sentinel: ``` current = make_overview_gpu(current, ...) ... nan_mask = cupy.isnan(current) if bool(nan_mask.any().item()): current = current.copy() current[nan_mask] = sentinel ``` `make_overview_gpu` returns a freshly allocated cupy buffer at every call site (the 2-D path ends in `cupy.nan*` / `cupy.around(...).astype(...)` / `cropped[::2, ::2].copy()`; the 3-D path ends in `cupy.stack`), so nothing aliased the buffer between the return and the in-place rewrite. The `current.copy()` allocated a second chunk-sized GPU buffer per overview level for no semantic gain. Replace the two-line rewrite with `cupy.putmask(current, nan_mask, sentinel)` so the existing buffer is mutated in place. Mirrors the in-place sentinel rewrite `_apply_nodata_mask_gpu` adopted in #1934. Tests cover: * structural -- the overview branch uses `cupy.putmask` and no longer contains `current = current.copy()`, * correctness -- a COG write of a float32 raster with an all-sentinel quadrant round-trips through the GPU writer and back to the CPU reader with NaN preserved on every overview level, * contract -- every overview method in `_block_reduce_2d_gpu` returns a cupy buffer whose `data.ptr` differs from the input, so the in-place mutation is safe.
1 parent 9ce0e60 commit bbec25c

2 files changed

Lines changed: 195 additions & 3 deletions

File tree

xrspatial/geotiff/_writers/gpu.py

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -479,6 +479,18 @@ def _gpu_compress_to_part(gpu_arr, w, h, spp):
479479
# all-sentinel cell NaN back to the sentinel after each level
480480
# so the on-disk overview tiles still carry the sentinel value
481481
# external readers expect.
482+
#
483+
# The sentinel rewrite uses ``cupy.putmask`` rather than
484+
# ``current.copy(); current[nan_mask] = sentinel`` because
485+
# ``make_overview_gpu`` always returns a freshly allocated
486+
# buffer (``_block_reduce_2d_gpu`` ends in ``cupy.nan*`` /
487+
# ``cupy.around(...).astype(...)`` / ``cropped[::2, ::2].copy()``,
488+
# and the 3-D path ends in ``cupy.stack``) so nothing else
489+
# aliases the buffer between the return and the rewrite.
490+
# Dropping the redundant ``copy()`` skips one chunk-sized GPU
491+
# allocation per overview level. Mirrors the in-place sentinel
492+
# rewrite ``_apply_nodata_mask_gpu`` adopted in #1934. See
493+
# issue #1948.
482494
current = arr
483495
cumulative_factor = 1
484496
for target_factor in overview_levels:
@@ -495,9 +507,9 @@ def _gpu_compress_to_part(gpu_arr, w, h, spp):
495507
and not np.isnan(float(nodata))):
496508
nan_mask = cupy.isnan(current)
497509
if bool(nan_mask.any().item()):
498-
current = current.copy()
499-
current[nan_mask] = np.dtype(
500-
str(current.dtype)).type(nodata)
510+
cupy.putmask(
511+
current, nan_mask,
512+
np.dtype(str(current.dtype)).type(nodata))
501513
oh, ow = current.shape[:2]
502514
parts.append(_gpu_compress_to_part(current, ow, oh, samples))
503515

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
"""Regression tests for issue #1948.
2+
3+
The COG overview generation loop inside ``write_geotiff_gpu`` used to
4+
replace the in-place NaN-to-sentinel rewrite with
5+
``current = current.copy(); current[nan_mask] = sentinel``. Every call
6+
to ``make_overview_gpu`` returns a freshly allocated cupy buffer
7+
(``_block_reduce_2d_gpu`` ends in ``cupy.nan*`` / ``cupy.around(...).astype(...)`` /
8+
``cropped[::2, ::2].copy()``, the 3-D path ends in ``cupy.stack``), so
9+
nothing else aliases the buffer between the return and the rewrite. The
10+
``.copy()`` allocated a second chunk-sized GPU buffer per overview
11+
level for no semantic gain.
12+
13+
The fix swaps the rewrite to ``cupy.putmask(current, nan_mask, sentinel)``
14+
which mutates the existing buffer in place. This drops one chunk-sized
15+
device allocation per overview level and mirrors the in-place sentinel
16+
rewrite ``_apply_nodata_mask_gpu`` adopted in #1934.
17+
18+
Two guards here:
19+
20+
1. Correctness -- a COG write with a sentinel-bearing float raster
21+
round-trips through the GPU writer and back to the CPU reader with
22+
the sentinel preserved in every overview level.
23+
2. Structural -- the writer source uses ``cupy.putmask`` and no longer
24+
carries the redundant ``current = current.copy()`` line in the
25+
overview loop.
26+
"""
27+
from __future__ import annotations
28+
29+
import os
30+
import tempfile
31+
32+
import numpy as np
33+
import xarray as xr
34+
35+
from xrspatial.geotiff.tests.conftest import requires_gpu as _gpu_only
36+
37+
38+
def _make_float_raster_with_nodata(height: int, width: int, sentinel: float):
39+
"""Build a float32 raster whose sentinel pixels would poison overviews.
40+
41+
Reserve a contiguous all-sentinel sub-rectangle so that every 2x
42+
decimation step still contains at least one fully-sentinel 2x2
43+
block. ``_block_reduce_2d_gpu`` masks the sentinel back to NaN
44+
before reducing, so a partially-sentinel block returns a valid mean
45+
and the overview pixel never becomes NaN; only a fully-sentinel
46+
block triggers the NaN->sentinel rewrite branch the fix touches.
47+
"""
48+
import cupy
49+
50+
arr = np.random.RandomState(1948).rand(height, width).astype(np.float32)
51+
# Bottom-right quadrant becomes all-sentinel so every 2x reduction
52+
# of that region stays all-sentinel and the NaN rewrite branch
53+
# fires at every overview level.
54+
arr[height // 2:, width // 2:] = sentinel
55+
return cupy.asarray(arr)
56+
57+
58+
def test_gpu_writer_overview_loop_uses_putmask_1948():
59+
"""Source-level guard against the redundant ``current.copy()``.
60+
61+
The fix replaces the ``current = current.copy(); current[nan_mask] = ...``
62+
pair with ``cupy.putmask(current, nan_mask, ...)`` so the buffer
63+
``make_overview_gpu`` returned is mutated in place. This guard
64+
fires if anyone reintroduces the redundant copy.
65+
"""
66+
import pathlib
67+
68+
src_path = pathlib.Path(__file__).parent.parent / "_writers" / "gpu.py"
69+
src = src_path.read_text()
70+
# The overview loop should call cupy.putmask, not current.copy() + indexed write.
71+
assert "cupy.putmask(" in src, (
72+
"write_geotiff_gpu overview loop should use cupy.putmask "
73+
"for the in-place NaN->sentinel rewrite (issue #1948)."
74+
)
75+
# Confirm the in-place pattern is the one inside the overview branch,
76+
# not somewhere unrelated. The pattern's location follows the
77+
# ``make_overview_gpu`` call and the ``nan_mask = cupy.isnan(current)``
78+
# gate; assert the order is preserved.
79+
idx_overview = src.find("make_overview_gpu(current")
80+
idx_putmask = src.find("cupy.putmask(", idx_overview)
81+
assert idx_overview != -1 and idx_putmask != -1, (
82+
"could not locate the overview-loop in-place rewrite"
83+
)
84+
# The legacy two-line pattern would have ``current = current.copy()``
85+
# right before the indexed write. Ensure the overview branch no
86+
# longer contains that exact line.
87+
overview_branch = src[idx_overview:idx_overview + 1500]
88+
assert "current = current.copy()" not in overview_branch, (
89+
"overview loop should no longer copy the cupy buffer before "
90+
"the in-place sentinel rewrite (issue #1948)."
91+
)
92+
93+
94+
@_gpu_only
95+
def test_gpu_writer_cog_overview_sentinel_roundtrip_1948():
96+
"""COG write -> CPU read preserves the sentinel through every overview.
97+
98+
Regression check on the correctness of the in-place rewrite. If
99+
``cupy.putmask`` ever drifts away from the ``current.copy()`` +
100+
indexed-write semantics (e.g. broadcasts the sentinel as the wrong
101+
dtype) the round-trip would surface NaN where the sentinel should
102+
be, breaking external-reader masking. The test is gated on a GPU
103+
because the overview loop only runs on the GPU writer code path.
104+
"""
105+
from xrspatial.geotiff import open_geotiff, write_geotiff_gpu
106+
107+
sentinel = np.float32(-9999.0)
108+
height, width = 1024, 1024
109+
arr_gpu = _make_float_raster_with_nodata(height, width, float(sentinel))
110+
111+
da = xr.DataArray(
112+
arr_gpu, dims=["y", "x"],
113+
coords={"y": np.arange(height), "x": np.arange(width)},
114+
attrs={"crs": 4326, "nodata": float(sentinel)},
115+
)
116+
117+
with tempfile.TemporaryDirectory() as td:
118+
path = os.path.join(td, "tmp_1948_cog.tif")
119+
write_geotiff_gpu(
120+
da, path, compression="deflate", tile_size=256,
121+
cog=True, overview_levels=[2, 4],
122+
)
123+
124+
# Read full-resolution and the two overview levels.
125+
full = open_geotiff(path)
126+
ov1 = open_geotiff(path, overview_level=1)
127+
ov2 = open_geotiff(path, overview_level=2)
128+
129+
# Full-resolution: sentinel pixels survive as NaN (the read path
130+
# masks the sentinel back to NaN since attrs['nodata'] is set).
131+
assert np.isnan(full.values).any(), (
132+
"full-resolution overview lost its sentinel pixels"
133+
)
134+
assert ov1.shape == (height // 2, width // 2), (
135+
f"overview level 1 shape {ov1.shape}, expected "
136+
f"({height // 2}, {width // 2})"
137+
)
138+
assert ov2.shape == (height // 4, width // 4), (
139+
f"overview level 2 shape {ov2.shape}, expected "
140+
f"({height // 4}, {width // 4})"
141+
)
142+
# Each overview must still mask the sentinel. With the prior
143+
# ``current.copy()`` pattern the test passed because the buffer was
144+
# copied; with ``cupy.putmask`` the test still passes because the
145+
# buffer is mutated in place. This guard fires only if the rewrite
146+
# is dropped entirely or sentinel dtype/promotion semantics change.
147+
assert np.isnan(ov1.values).any(), (
148+
"overview level 1 lost its sentinel pixels after the in-place rewrite"
149+
)
150+
assert np.isnan(ov2.values).any(), (
151+
"overview level 2 lost its sentinel pixels after the in-place rewrite"
152+
)
153+
154+
155+
@_gpu_only
156+
def test_gpu_writer_overview_uses_make_overview_gpu_fresh_buffer_1948():
157+
"""``make_overview_gpu`` returns a freshly allocated cupy buffer.
158+
159+
The in-place rewrite contract relies on ``make_overview_gpu``
160+
handing back a buffer that no caller-visible state aliases. This
161+
guard exercises every overview-method path inside
162+
``_block_reduce_2d_gpu`` and confirms the returned buffer is a
163+
different cupy.ndarray than the input.
164+
"""
165+
import cupy
166+
167+
from xrspatial.geotiff._gpu_decode import make_overview_gpu
168+
169+
rng = np.random.RandomState(1948)
170+
src_np = rng.rand(64, 64).astype(np.float32)
171+
src_gpu = cupy.asarray(src_np)
172+
173+
for method in ("mean", "nearest", "min", "max", "median"):
174+
out = make_overview_gpu(src_gpu, method=method, nodata=None)
175+
assert int(out.data.ptr) != int(src_gpu.data.ptr), (
176+
f"make_overview_gpu(method={method!r}) returned an aliased buffer"
177+
)
178+
assert out.shape == (32, 32), (
179+
f"make_overview_gpu(method={method!r}) returned shape {out.shape}"
180+
)

0 commit comments

Comments
 (0)