Skip to content

Refactor GeoTIFF Phase 5e: extract _overview.py from _writer.py#2261

Merged
brendancol merged 2 commits into
mainfrom
issue-2259-overview-extraction
May 21, 2026
Merged

Refactor GeoTIFF Phase 5e: extract _overview.py from _writer.py#2261
brendancol merged 2 commits into
mainfrom
issue-2259-overview-extraction

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Mechanical extraction. Moves the overview pyramid helpers out of _writer.py into a new _overview.py module so _writer.py no longer carries pyramid logic alongside the strip / tile / IFD / layout code.

Moved:

  • _make_overview
  • _block_reduce_2d
  • _replicate_pad_2d
  • _resolve_int_nodata (overview-scoped; all call sites are inside the block reducer)
  • _validate_overview_levels
  • OVERVIEW_METHODS and _MAX_OVERVIEW_LEVELS

_writer.py re-exports each name so the public import path (xrspatial.geotiff._writer) keeps working for tests, the _writers subpackage, and _gpu_decode. Internal importers in _writers/gpu.py and _gpu_decode.py are switched to the new _overview module directly.

_writer.py drops from 2238 to 1804 lines (434 lines moved out, within the issue's 400-460 target).

Test plan

  • pytest xrspatial/geotiff/tests/ passes (5037 passed, 68 skipped; one pre-existing failure in test_lowlevel_write_pushdown_2138.py::test_write_vs_to_geotiff_byte_parity_uint8[lz4] reproduces on origin/main and is unrelated).
  • from xrspatial.geotiff._writer import _make_overview, _block_reduce_2d, _replicate_pad_2d, _resolve_int_nodata, _validate_overview_levels, OVERVIEW_METHODS, _MAX_OVERVIEW_LEVELS still works.
  • from xrspatial.geotiff._overview import ... works for the same names.

Closes #2259
Part of #2211

Move the overview pyramid helpers out of ``_writer.py`` into a new
``_overview.py`` module:

- ``_make_overview``
- ``_block_reduce_2d``
- ``_replicate_pad_2d``
- ``_resolve_int_nodata``
- ``_validate_overview_levels``
- ``OVERVIEW_METHODS`` and ``_MAX_OVERVIEW_LEVELS`` constants

``_writer.py`` re-exports the names so the public import path
(``xrspatial.geotiff._writer``) keeps working for tests, the
``_writers`` subpackage, and ``_gpu_decode``. Internal importers in
``_writers/gpu.py`` and ``_gpu_decode.py`` are switched to the new
``_overview`` module directly.

Behavior-neutral. ``_writer.py`` drops from 2238 to 1804 lines.

Part of #2211.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Refactor GeoTIFF Phase 5e: extract _overview.py from _writer.py

Mechanical extraction. I verified the moved code matches origin/main byte-for-byte (modulo the new module docstring on OVERVIEW_METHODS) and that all import paths still resolve.

Blockers

None.

Suggestions

None.

Nits

  • xrspatial/geotiff/_writer.py:370 -- the new from ._overview import (...) re-export block sits mid-file (after the compression helpers, before the strip writer). The sibling _write_layout re-export from PR-G sits near the top imports at line 89. Co-locating the two re-export blocks would make the "this name lives elsewhere now" pattern easier to scan. Not worth a churn-only follow-up if PR-L is about to add a third re-export block in roughly the same neighbourhood.
  • xrspatial/geotiff/_overview.py:22-23 -- the OVERVIEW_METHODS constant gained a one-line #: Sphinx docstring that the original in _writer.py:352 did not have. Minor, behavior-neutral, but technically a non-mechanical edit in a mechanical-extraction PR. Fine to keep.

What looks good

  • The five helpers and two constants land in _overview.py with their docstrings, comments, and issue references intact (#1613, #1623, #1766, #1975, #2105).
  • _PARALLEL_MIN_BYTES correctly stayed in _writer.py -- it gates the strip / tile parallel-compress path, not overview generation. Easy thing to scoop accidentally; this version got it right.
  • _resolve_int_nodata placement: the issue's scope note said "move only if overview-scoped." All three call sites are inside _block_reduce_2d, and grep finds no external callers. The move is correct.
  • Direct importers (_writers/gpu.py for _MAX_OVERVIEW_LEVELS and _validate_overview_levels; _gpu_decode.py for _block_reduce_2d) were rewired to ._overview rather than relying on the re-export, which keeps the dependency edge explicit. The _gpu_decode.py docstring cross-reference was updated to match.
  • The re-export in _writer.py preserves the public-ish xrspatial.geotiff._writer import path that 17 test files rely on (test_cog.py, test_writer.py, test_polish_1488.py, the various *_overview_* and *_overview_nodata_* regression tests, etc.). No test-side churn needed.
  • _writer.py drops from 2238 to 1804 lines (-434), inside the 400-460 target.
  • The pre-existing test_lowlevel_write_pushdown_2138.py::test_write_vs_to_geotiff_byte_parity_uint8[lz4] failure reproduces on a clean origin/main checkout, so it is not regression noise from this PR.

Checklist

  • Behavior-neutral: byte-equivalent moved code
  • Public import paths preserved via re-export
  • All four backends (numpy / cupy / dask+numpy / dask+cupy) unaffected -- helpers are CPU-side, GPU dispatch unchanged
  • NaN / nodata handling logic moved intact (issues #1613, #1623, #1975)
  • Odd-shape ceil semantics moved intact (issue #2105)
  • _writer.py line count drop within target (-434, target 400-460)
  • Sibling PR-J / PR-L scope respected -- only _writer.py, _overview.py, _writers/gpu.py, _gpu_decode.py touched

- Move the ``_overview`` re-export block up next to the existing
  ``_write_layout`` re-export, so both "lives elsewhere now" blocks
  sit together near the top of the module instead of one being
  mid-file.
- Drop the ``OVERVIEW_METHODS`` ``#:`` docstring that was added
  during the move so the extraction is strictly content-identical
  to the original in ``_writer.py``.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (follow-up pass)

Both nits from the first pass were addressed in b47f8579:

  • The from ._overview import (...) re-export now sits directly under the existing _write_layout re-export block at _writer.py:102-118, so the two "lives elsewhere now" blocks read together. Matches the surrounding pattern.
  • The OVERVIEW_METHODS #: docstring was dropped, so the moved content is now byte-equivalent to the origin/main version of _writer.py:352-812. I re-confirmed this with diff against the source: the only delta is _PARALLEL_MIN_BYTES, which correctly stays in _writer.py because it gates the strip/tile parallel-compress path, not overviews.

Blockers

None.

Suggestions

None.

Nits

None.

What looks good

  • Extraction is now strictly mechanical -- content-identical to origin/main, only the file boundary changed.
  • Re-export shape preserves the 17 test-file imports from xrspatial.geotiff._writer plus the _writers/gpu.py and _gpu_decode.py direct importers.
  • _writer.py final size 1800 lines (-438 from origin/main's 2238), inside the 400-460 target.

@brendancol brendancol merged commit d13754a into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor GeoTIFF Phase 5e: extract _overview.py from _writer.py (PR-K of #2211)

1 participant