Skip to content

rasterize: sweep-test-coverage pass 2 -- NaN/Inf burns, nested GC, GPU multi-column#2256

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-test-coverage-rasterize-2026-05-21
May 21, 2026
Merged

rasterize: sweep-test-coverage pass 2 -- NaN/Inf burns, nested GC, GPU multi-column#2256
brendancol merged 3 commits into
mainfrom
deep-sweep-test-coverage-rasterize-2026-05-21

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

deep-sweep test-coverage pass 2 on rasterize. Adds 58 tests in
xrspatial/tests/test_rasterize_coverage_2026_05_21.py closing the
public-API coverage gaps left after pass-1 (2026-05-17). Source untouched.

Gaps closed

  • Cat 2 HIGH (NaN / Inf edges) -- pin +Inf / -Inf / NaN burn values
    across numpy / cupy / dask+numpy / dask+cupy on polygon, point, and
    line geometries. Inf + finite under sum stays Inf; Inf + (-Inf)
    under sum collapses to NaN; min(Inf, 1.0) and max(-Inf, 1.0)
    pick the finite value. Inf-as-bound is rejected with the same
    ValueError as NaN-as-bound (pass-1 only covered NaN bound).
  • Cat 1 MEDIUM (backend coverage) -- nested GeometryCollection
    (GC inside a GC) on all four backends. rasterize.py:1995 documents
    recursive unpacking but only flat GCs were tested. Deeply-nested
    3-levels eager test pins that the recursion depth limit is not 1 or 2.
  • Cat 1 MEDIUM (backend coverage) -- columns= (multi-column
    properties) on cupy and dask+cupy. TestMultiColumn covered only
    numpy and dask+numpy.
  • Cat 3 LOW (geometric edges) -- rectangular-pixel parity
    (resolution=(rx, ry), rx != ry) across all four backends.

Bug surfaced -- issue #2255

The NaN-burn-under-max tests revealed a cross-backend divergence:
GPU max / min merge silently suppresses NaN burn values while
CPU propagates them per IEEE arithmetic.

  • CPU: max(NaN, 1.0) returns NaN (because 1.0 > NaN is False, the
    kernel keeps the prior NaN pixel).
  • GPU: max(NaN, 1.0) returns 1.0 (the kernel inits the output buffer
    to -inf for max, +inf for min, and atomicMax/Min is NaN-suppressing
    under IEEE device semantics).

Filed as #2255. This PR pins both observables in paired
CPU vs GPU tests (test_nan_burn_overlaps_max_cpu_propagates,
test_nan_burn_overlaps_max_gpu_suppresses_nan,
test_nan_burn_single_geom_max_gpu_returns_neg_inf) so the divergence
is visible in CI until the GPU kernels are aligned with the CPU
is_first semantics.

State updates

  • .claude/sweep-test-coverage-state.csv: rasterize row updated with
    pass-2 notes, severity_max=HIGH, categories_found=1;2;3,
    issue=2255.

Test plan

  • All 58 new tests pass on a CUDA host
    (pytest xrspatial/tests/test_rasterize_coverage_2026_05_21.py)
  • Pre-existing 417 rasterize tests still pass
    (pytest xrspatial/tests/test_rasterize*.py)
  • Source files untouched (only adding tests + state CSV)
  • CI runs the new tests on CPU and GPU hosts

Adds test_rasterize_coverage_2026_05_21.py with 58 tests closing the
remaining public-API coverage gaps after pass-1 (2026-05-17):

- Cat 2 HIGH +/-Inf and NaN burn values across all four backends:
  +Inf / -Inf burn under default merge; Inf + finite under sum stays
  Inf; Inf + (-Inf) under sum collapses to NaN; min(Inf, 1.0) and
  max(-Inf, 1.0) pick the finite value; Inf-as-bound rejected with
  the same ValueError as NaN-as-bound (pass-1 only tested NaN bound).
  NaN-burn polygon, point, and line each pin the rasterized cell.
- Cat 1 MEDIUM nested GeometryCollection across all four backends.
  rasterize.py:1995 documents recursive unpacking but only flat GCs
  were tested. Deeply-nested-3-levels eager test pins that the
  recursion depth limit is not 1 or 2.
- Cat 1 MEDIUM columns= (multi-column properties) parity on cupy and
  dask+cupy (TestMultiColumn covered numpy / dask+numpy only). Pin
  three-column props on GPU so the (N, P) loop survives the kernel
  boundary.
- Cat 3 LOW rectangular-pixel parity (resolution=(rx, ry), rx != ry)
  across all four backends.

Tests surfaced a cross-backend bug: GPU max/min merge silently
suppresses NaN burn values (CPU keeps NaN per IEEE; GPU returns the
finite value or the -inf buffer init). Filed as issue #2255 and
pinned both observables in paired CPU / GPU tests so the divergence
is visible in CI until the GPU kernels are aligned.

Source untouched. All 58 new tests pass on a CUDA host; pre-existing
417 rasterize tests still pass.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: rasterize sweep-test-coverage pass 2

Summary

Tests-only PR (xrspatial/tests/test_rasterize_coverage_2026_05_21.py, +547/-0) adding 58 tests across four buckets: Inf/NaN burn values, nested GeometryCollection, GPU multi-column, rectangular pixels. State CSV updated (+1/-1). No source changes. All 58 new tests pass locally, and CI is green on macOS/Ubuntu/Windows + py3.14. The 206 pre-existing rasterize tests still pass (verified locally).

Blockers

None.

Suggestions

None.

Nits

  • test_inf_bounds_rejected (lines 208-219) matches on "must be finite", which also appears in the resolution-validation error at xrspatial/rasterize.py:3101 ("resolution must be finite and > 0"). The path the test targets is at rasterize.py:3078-3081: "Invalid bounds: all of (xmin, ymin, xmax, ymax) must be finite, ...". The regex still hits the right path today because the resolution path is not reached, but r"Invalid bounds:.*must be finite" would be safer against a future refactor that shuffles validation order. Verified manually that the test triggers the bounds path today.
  • TestNaNBurnValues references issue #2255 in a comment at line 283. Putting that reference (or a module-level ISSUE_2255 = ... constant) on the class docstring too would make it easier to find when the GPU max kernel is later aligned and the ..._gpu_suppresses_nan / ..._gpu_returns_neg_inf tests need to be inverted.
  • test_deeply_nested_gc_eager (lines 398-411) only runs against numpy. The fast-path-vs-slow-path branch in _classify_geometries is backend-agnostic, so this is not load-bearing, but one dask+numpy run would also check that nested-GC pre-classification survives the per-tile graph builder.

What looks good

  • Every test class docstring says what it pins and why a regression would be silent today. Makes the file readable a year from now.
  • The GPU/CPU NaN-max divergence is pinned as paired tests rather than hidden behind xfail, so the asymmetry shows up in CI instead of getting buried in skip output. Issue #2255 (open) tracks the source fix.
  • _materialise handles all four backends in one place (dask compute + cupy host copy), so individual tests do not leak backend assumptions.
  • Inf-as-bound rejection (line 208) and the nested-GC recursion check (line 398) close gaps that the source documented but had no test for (rasterize.py:1995 GC recursion, rasterize.py:3078 bounds validation).
  • Skip markers use pytest.param(..., marks=...) with stable ids (numpy, cupy, dask_numpy, dask_cupy), so test IDs are searchable and skip output stays readable on CPU-only hosts.

Checklist

  • Source unchanged (tests-only)
  • All four backends covered where the source supports them
  • NaN handling pinned per-backend (CPU IEEE propagation, GPU asymmetry)
  • Edge cases (Inf, -Inf, Inf+(-Inf), NaN burn, nested GC, rect pixels)
  • Dask paths use chunked fixtures (chunks=(5,5) and chunks=(3,3))
  • No premature materialization; _materialise is the single sink
  • No benchmark needed (no source changes)
  • No README change needed (no new public API)
  • Docstrings on every test explain the pin
  • Bug surfaced (#2255) is pinned in CI rather than fixed in scope

Three nit-level fixes from the post-PR review of #2256, none affecting
test intent or coverage:

- test_inf_bounds_rejected: tighten regex to "Invalid bounds:.*must be
  finite" so a future refactor that adds a different "must be finite"
  check (e.g. on resolution at rasterize.py:3101) earlier in the call
  cannot accidentally satisfy the assertion via the wrong code path.
- TestNaNBurnValues docstring: pull the issue #2255 reference up from
  the per-test comment so the GPU/CPU NaN-merge divergence is flagged
  on the class as well as on the asymmetric paired tests.
- test_deeply_nested_gc: parametrize over numpy and dask+numpy so the
  recursive GeometryCollection pre-classification check covers the
  per-tile graph builder too, not just the eager path.

All 59 tests in the file pass (was 58, +1 from the new dask_numpy
parametrization).
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (post-fix)

Pushed c9a71bd4 addressing all three nits from the prior review, plus a merge of main (c5a66009) to pull in the unrelated GeoTIFF Phase 5b/5c refactors that landed in between.

Nit disposition

  • Nit 1 (regex collision): fixed -- test_inf_bounds_rejected now matches r"Invalid bounds:.*must be finite" so it cannot be satisfied by the resolution-validation path.
  • Nit 2 (issue #2255 visibility): fixed -- TestNaNBurnValues class docstring now flags the GPU/CPU NaN-merge divergence and points to the asymmetric paired tests.
  • Nit 3 (deeply-nested GC dask coverage): fixed -- test_deeply_nested_gc_eager was renamed test_deeply_nested_gc and parametrized over numpy and dask_numpy, so the recursive _classify_geometries check now covers the per-tile graph builder too.

Verification

  • Local: pytest xrspatial/tests/test_rasterize_coverage_2026_05_21.py -- 59 passed (was 58, +1 from the new dask_numpy parametrization).
  • Pre-existing 206 rasterize tests still pass.
  • Merge of main was clean (GeoTIFF-only conflict surface, no overlap with rasterize).

Out of scope (intentional)

  • Issue #2255 (GPU max/min NaN suppression) is not fixed in this PR. The paired CPU-propagates / GPU-suppresses tests pin the current divergence so it stays visible in CI until the GPU atomic kernels can be aligned with the CPU is_first semantics. That's a source-level fix in _compile_gpu_kernels (rasterize.py:~1320) and belongs in a separate PR.

No remaining findings from my side. Ready for an outside reviewer.

@brendancol brendancol merged commit 7501b67 into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant