geotiff: extract read_geotiff_gpu to _backends/gpu.py (#1885)#1895
Conversation
Step 7 of #1813's multi-PR refactor of __init__.py. Pure code motion; no public API change. Moved into a new xrspatial/geotiff/_backends/gpu.py: - read_geotiff_gpu: GPU read entry point (~640 lines, the largest single function in __init__.py). - _gds_chunk_path_available: predicate gating the GDS direct-to-GPU chunked path. - _decode_window_gpu_direct: per-chunk disk->GPU decode used by the GDS path. - _read_geotiff_gpu_chunked: lazy Dask+CuPy graph builder for read_geotiff_gpu(chunks=...). - _read_geotiff_gpu_chunked_gds: direct-to-GPU specialisation built by _read_geotiff_gpu_chunked when KvikIO + a chunky tiled local file qualifies. Function bodies are unchanged except for adjusted relative imports (``._X`` -> ``.._X`` outside _backends, ``._gpu_helpers`` for the already-extracted GPU helpers). __init__.py re-exports read_geotiff_gpu so ``from xrspatial.geotiff import read_geotiff_gpu`` keeps working. The four private helpers are not re-exported because they were only called by read_geotiff_gpu itself; nothing outside the GPU module references them. One known consumer of the private names had to update its monkeypatch target: test_gpu_chunks_out_of_core_1876.py patches ``_decode_window_gpu_direct`` to spy on chunk-decode calls. The patch target moves from ``xrspatial.geotiff`` to ``xrspatial.geotiff._backends.gpu`` so the spy still intercepts the real call site (module-global lookup happens in the defining module). The Dask+CuPy chunked-fallback path needs ``read_geotiff_dask``, which still lives in __init__.py. To avoid a circular import, _read_geotiff_gpu_chunked lazy-imports it inside the function body (``from .. import read_geotiff_dask``). PR #8 will move ``read_geotiff_dask`` into ``_backends/dask.py`` and the lazy import can be replaced with a static one then. Net __init__.py change: 3969 -> 2962 lines (-1007). _backends/gpu.py is 1047 lines. Verification: pixel-parity matrix from #1889 (192 cells, all GPU cells), runtime identity (9), GPU byteswap (#1508), GPU kwarg rename (#1560), GPU chunks out-of-core (#1876, including the GDS spy tests), full geotiff suite minus 8 pre-existing main failures (read_to_array monkeypatch + tile_size_positive_works) -- 2862/2862 pass. Refs #1813.
There was a problem hiding this comment.
Pull request overview
This PR continues the GeoTIFF backend refactor by moving the GPU read entry point and its GPU-chunking helpers out of xrspatial/geotiff/__init__.py into a dedicated backend module while preserving the public read_geotiff_gpu import.
Changes:
- Adds
xrspatial/geotiff/_backends/gpu.pycontainingread_geotiff_gpuand related private GPU chunk helpers. - Re-exports
read_geotiff_gpufromxrspatial.geotiff. - Updates GPU chunk tests to monkeypatch the helper in its new module.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_backends/gpu.py |
New home for GPU GeoTIFF read implementation and private chunk/GDS helpers. |
xrspatial/geotiff/__init__.py |
Removes the moved implementation and imports the public GPU read entry point from the backend module. |
xrspatial/geotiff/tests/test_gpu_chunks_out_of_core_1876.py |
Updates monkeypatch targets for moved GPU chunk helper. |
Comments suppressed due to low confidence (1)
xrspatial/geotiff/_backends/gpu.py:785
- The mmap fallback in the direct chunk decoder also omits the LERC
masked_fillvalue when callinggpu_decode_tiles. If KvikIO is present but unusable and this fallback runs, LERC valid-mask pixels keep the decoder's zero fill instead of being restored to nodata, so the later nodata mask cannot match them.
arr_gpu = gpu_decode_tiles(
compressed_tiles, tw, th, sub_w, sub_h,
compression, predictor, file_dtype, samples,
byte_order=byte_order,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if band is not None: | ||
| if n_bands_out == 0: | ||
| if band != 0: | ||
| raise IndexError( | ||
| f"band={band} requested but file is single-band.") | ||
| elif band < 0 or band >= n_bands_out: | ||
| raise IndexError( | ||
| f"band={band} out of range for {n_bands_out}-band file.") |
| arr_gpu = gpu_decode_tiles_from_file( | ||
| file_path, sub_offsets, sub_byte_counts, | ||
| tw, th, sub_w, sub_h, | ||
| compression, predictor, file_dtype, samples, | ||
| byte_order=byte_order, | ||
| ) |
| @@ -76,6 +76,7 @@ | |||
| _gpu_decode_single_band_tiles, | |||
Drop five GPU-helper re-imports from __init__.py that no longer have internal callers after ``read_geotiff_gpu`` moved out: - ``_apply_nodata_mask_gpu`` - ``_apply_orientation_geo_info`` - ``_apply_orientation_gpu`` - ``_gpu_apply_window_band`` - ``_gpu_decode_single_band_tiles`` ``_is_gpu_data`` is still imported because ``to_geotiff`` and ``write_geotiff_gpu`` (which stay in __init__.py for now) reference it to detect cupy-backed inputs. The two parity-bug comments Copilot flagged on _backends/gpu.py (boolean band validation missing on the GDS chunked path, LERC masked_fill dropped on the GDS chunked path) are pre-existing in the function bodies as they lived in __init__.py before this move. They will be filed as their own follow-up issue against #1813 so this PR stays pure code motion.
|
Addressed Copilot's three line comments:
Re-running CI on the new commit. |
* geotiff: extract read_geotiff_dask to _backends/dask.py (#1886) Step 8 of #1813's multi-PR refactor of __init__.py. Pure code motion; no public API change. Moved into a new xrspatial/geotiff/_backends/dask.py: - read_geotiff_dask: dask read entry point (~350 lines), with the HTTP/fsspec COG metadata prefetch, MinIsWhite nodata-inversion detection, window clipping, max_pixels guard, and 50k-task graph cap intact. - _delayed_read_window: the per-chunk reader closure (~100 lines) that fans out via dask.delayed. Function bodies unchanged except for adjusted relative imports (``._X`` -> ``.._X``). Two helpers still living in __init__.py (``read_vrt`` and ``_read_geo_info``) are lazy-imported inside the function bodies to avoid a circular import; later PRs in #1813 will move them out and the lazy imports can become static. Promotes the matching lazy import on the other side: _backends/gpu.py's ``from .. import read_geotiff_dask`` (added for the same circular- import reason in #1895) becomes a static top-of-module ``from .dask import read_geotiff_dask`` now that the target is at a sibling path. Test changes ============ - test_dask_no_op_astype_1624.py monkeypatches ``_read_to_array``; the target moves from ``xrspatial.geotiff`` to ``xrspatial.geotiff._backends.dask`` so the spy still intercepts the per-chunk worker. Same pattern as the GPU spy update in #1895. - Restored the ``_apply_nodata_mask_gpu`` re-export from __init__.py (with a ``# noqa: F401`` marker). PR #1895's review-fix commit dropped this re-import after concluding it had no internal callers; it overlooked five test imports across ``test_nodata_nan_int_1774`` and ``test_nodata_out_of_range_1581`` that exercise the helper directly. Restoring the re-export fixes 6 tests that were failing on main since #1895 merged. Diff stats ========== - __init__.py: 2962 -> 2504 (-458 lines), plus 1 re-import line. - New _backends/dask.py: 481 lines. Verification: pixel-parity matrix from #1889 (192 cells), runtime identity (9), dask int nodata chunks (#1597), attrs parity (#1548), dask no-op astype (#1624), gpu chunks out-of-core (#1876), full geotiff suite minus the 8 pre-existing main failures -- 2862/2862 pass. Refs #1813. * geotiff: address Copilot review on _backends/dask.py extraction (#1897) Two cleanups Copilot flagged: 1. ``_backends/gpu.py`` module docstring still described the ``read_geotiff_dask`` import as lazy. The import was promoted to a static top-of-module ``from .dask import read_geotiff_dask`` in this PR; updated the docstring to reflect the current strategy and note why the original lazy form existed. 2. ``__init__.py`` had the ``# noqa: F401`` applied to a multi-name import, suppressing lint signal for ``_is_gpu_data`` as well as the intentional ``_apply_nodata_mask_gpu`` re-export. Split into two imports so the ``noqa`` marks only the re-export line. ``_is_gpu_data`` keeps full lint coverage; a future genuinely-unused import on it would surface normally.
…#1888) (#1899) * geotiff: extract writers to _writers/, reduce __init__.py to dispatch (#1888) Final step (10) of #1813's multi-PR refactor of __init__.py. Pure code motion; no public API change. Created the _writers/ subpackage with three modules: - _writers/eager.py: ``to_geotiff`` (public eager writer), ``_write_single_tile`` (per-tile worker), and ``_write_vrt_tiled`` (deprecated vrt_tiled=True path). ~870 lines. - _writers/gpu.py: ``write_geotiff_gpu`` (nvCOMP-backed GPU writer with CPU fallback). ~500 lines. - _writers/vrt.py: ``write_vrt`` (mosaic XML generator with crs_wkt deprecation handling). ~90 lines. __init__.py re-exports the three public writer names. The two private helpers tests reach for (``_write_single_tile``, ``_apply_nodata_mask_gpu``) keep their package-level re-exports with ``# noqa: F401`` markers since the test files import them directly. Final shape of __init__.py: 529 lines (down from 1905 pre-PR and 4902 at the start of #1813 series). It now holds only: - Module docstring + ``__all__`` (lines 1-112). - ``_read_geo_info``: metadata-only header parse used by ``open_geotiff`` and lazy-imported by ``_backends/dask.py``. - ``open_geotiff``: the public auto-dispatching read entry point that routes between the four read backends and ``read_vrt``. - ``plot_geotiff``: deprecated wrapper that emits a ``DeprecationWarning`` pointing to ``da.xrs.plot()``. Test updates ============ - test_to_geotiff_gpu_fallback_1674.py monkeypatches ``write_geotiff_gpu`` and ``_is_gpu_data`` as resolved by ``to_geotiff``. The patch targets move from ``xrspatial.geotiff`` to ``xrspatial.geotiff._writers.eager`` so the spy still intercepts the real call sites. Same module-global-rebind pattern as #1895 and #1897. Net change since the start of #1813 ==================================== - __init__.py: 4902 -> 529 lines (-4373; -89%). - New per-responsibility modules under xrspatial/geotiff/: _runtime.py (#1880), _crs.py (#1881), _validation.py (#1882), _attrs.py (#1883), _backends/_gpu_helpers.py (#1884), _backends/gpu.py (#1885), _backends/dask.py (#1886), _backends/vrt.py (#1887), _writers/{eager,gpu,vrt}.py (this PR). - One new regression-backstop test: test_backend_pixel_parity_matrix_1813.py (#1879), 192 parametrised cells covering pixel/coord/attrs parity across the four backends and four read entry points. Verification: 2862/2862 of the geotiff test suite pass (the 8 remaining pre-existing main failures from #1517 and #1776 are unrelated to #1813 and were excluded as deselected throughout). Closes #1888. Closes #1813. * geotiff: address Copilot review on _writers extraction (#1899) Drop the unused module-level and in-function imports Copilot flagged: eager.py - Drop ``import math`` (no math.* calls in the file). - Drop ``RASTER_PIXEL_IS_POINT`` re-import (only RASTER_PIXEL_IS_AREA is referenced; the POINT path lives in _writers/_vrt_tiled below and uses its own internal constants). - Drop ``_validate_dtype_cast`` re-import (never called; the writer side does not validate dtype casts post-decode). - Drop ``from .vrt import write_vrt`` re-import; ``_write_vrt_tiled`` does a lazy ``from .._vrt import write_vrt as _write_vrt_fn`` at its single call site so the module-top import was dead. - Drop the redundant ``import os`` inside ``_write_vrt_tiled``; the module-level ``import os`` already covers the os.path / os.makedirs calls in that body. gpu.py - Drop ``import math`` and ``import os`` (no math.* / os.* references in the file body). - Drop ``GeoTransform`` re-import at module level (only used in eager.py's _write_single_tile, which still imports it). - Drop ``_validate_dtype_cast`` re-import (never called). - Drop ``from .._writer import write`` re-import; only docstring references mention it. - Drop the in-function ``GeoTransform as _GT`` alias inside the ``_writer`` import block (the alias was carried over verbatim from __init__.py but the function body never references _GT). - Drop the in-function ``from .._dtypes import numpy_to_tiff_dtype`` (also unreferenced in the body). All 2862 geotiff tests still pass; pyflakes is clean on these files except for the ``cupy.ndarray`` type annotation in ``write_geotiff_gpu``'s signature, which works fine under ``from __future__ import annotations`` and matched the same surface the original __init__.py had through earlier #1813 PRs.
Summary
Step 7 of #1813's multi-PR refactor of
xrspatial/geotiff/__init__.py. Pure code motion; no public API change. Biggest single-function move in the sequence.Moved into a new
xrspatial/geotiff/_backends/gpu.py:read_geotiff_gpu— GPU read entry point (~640 lines, the largest single function in__init__.py)._gds_chunk_path_available— predicate gating the GDS direct-to-GPU chunked path._decode_window_gpu_direct— per-chunk disk→GPU decode used by the GDS path._read_geotiff_gpu_chunked— lazy Dask+CuPy graph builder forread_geotiff_gpu(chunks=...)._read_geotiff_gpu_chunked_gds— direct-to-GPU specialisation when KvikIO + a chunky tiled local file qualifies.Function bodies are unchanged except for adjusted relative imports (
._X→.._Xoutside_backends,._gpu_helpersfor the already-extracted GPU helpers from #1894).__init__.pyre-exportsread_geotiff_gpusofrom xrspatial.geotiff import read_geotiff_gpukeeps working. The four private helpers are not re-exported because they were only called byread_geotiff_gpuitself; nothing outside the GPU read flow references them.Test update
One existing test had to update its monkeypatch target:
test_gpu_chunks_out_of_core_1876.pypatches_decode_window_gpu_directto spy on chunk-decode calls. The target moves fromxrspatial.geotifftoxrspatial.geotiff._backends.gpuso the spy still intercepts the real call site (module-global lookup happens in the defining module). Both spy tests (uses_gds_path_when_available,fallback_when_kvikio_absent) pass after the update.Circular-import note
The Dask+CuPy chunked-fallback path needs
read_geotiff_dask, which still lives in__init__.py. To avoid a circular import,_read_geotiff_gpu_chunkedlazy-imports it inside the function body (from .. import read_geotiff_dask). PR #8 (#1886) will moveread_geotiff_daskinto_backends/dask.py; the lazy import can be promoted to a static one then.Diff stats
__init__.py: 3969 → 2962 (-1007 lines, the single biggest reduction in the series)._backends/gpu.py: 1047 lines.Test plan
gpukeyword has three incompatible value spaces across the public API #1560), GPU chunks out-of-core (geotiff: read_geotiff_gpu(chunks=...) is not actually out-of-core #1876) — all pass, including the GDS spy tests after the monkeypatch-target update.read_to_arraymonkeypatch +tile_size_positive_works) — 2862/2862 green.Closes #1885. Refs #1813.