Skip to content

Refactor GeoTIFF Phase 5f: extract _encode.py from _writer.py#2263

Merged
brendancol merged 2 commits into
mainfrom
issue-2260-encode-extraction
May 21, 2026
Merged

Refactor GeoTIFF Phase 5f: extract _encode.py from _writer.py#2263
brendancol merged 2 commits into
mainfrom
issue-2260-encode-extraction

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2260
Part of #2211

Summary

Move the write-side encode helpers out of _writer.py into a new
_encode.py. Mirrors the _decode.py extraction on the read side
(PR-G).

Helpers relocated:

  • Strip and tile encode: _prepare_strip, _write_stripped,
    _prepare_tile, _write_tiled
  • Streaming block compression: _compress_block
  • Photometric and MinIsWhite: _resolve_photometric,
    _reject_disagreeing_photometric_override,
    _apply_photometric_miniswhite_invert,
    _invert_nodata_for_miniswhite,
    PHOTOMETRIC_MINISBLACK / PHOTOMETRIC_RGB /
    _PHOTOMETRIC_NAME_MAP
  • Predictor: normalize_predictor, _apply_predictor_encode
  • Compression name mapping: _compression_tag
  • Threshold constant: _PARALLEL_MIN_BYTES

_writer.py keeps _write, _write_streaming,
_validate_lowlevel_write_kwargs, and the fsspec write helpers, and
re-exports every moved name so existing import paths and the
_writers subpackage continue to work without churn.

_write_tiled and _write_stripped look up _prepare_tile and
_prepare_strip through _writer so the monkeypatch contract
pinned by test_gil_friendly_kwarg_1830 survives the move.

_writer.py drops from 1800 to 1269 lines (~531 line move).

Test plan

  • pytest xrspatial/geotiff/tests/ -- 5037 passed, 68 skipped,
    one pre-existing lz4 byte-parity failure unrelated to this PR.
  • Photometric / predictor / strip / tile / miniswhite / compression
    tag tests pass unchanged.
  • Public import surface (from xrspatial.geotiff._writer import _resolve_photometric etc.) still resolves.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 21, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Refactor GeoTIFF Phase 5f: extract _encode.py from _writer.py

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • _encode.py line 27-29 (module docstring): the docstring says heavier helpers are looked up lazily inside _compress_block / _prepare_strip / _prepare_tile, but the lazy _writer import is actually in _write_stripped (line 369) and _write_tiled (line 516). _compress_block, _prepare_strip, and _prepare_tile themselves do no lazy lookup. Tighten the wording to point at the actual call sites.

Nits (optional improvements)

  • The lazy-lookup comments at _encode.py:365-368 and _encode.py:511-515 are clear, but consider adding a one-line cross-reference to the test that pins the contract (test_gil_friendly_kwarg_1830::test_write_tiled_parallel_passes_gil_friendly_positionally), so the next refactor knows which test will catch a regression.

What looks good

  • Move is mechanical and byte-neutral. Every helper and constant lands in the same shape it had in _writer.py.
  • Re-export block at _writer.py:97-114 preserves the historical import surface for the _writers subpackage, _gpu_decode, and the test suite.
  • Lazy _writer._prepare_tile / _writer._prepare_strip lookup inside _write_tiled and _write_stripped keeps the monkeypatch contract pinned by test_gil_friendly_kwarg_1830 working without test changes. Same pattern _write_layout.py uses for _resolve_photometric.
  • No module-level cycle: _encode.py only pulls _compression, _header, and _write_layout at load time; _writer is imported lazily inside the two orchestrators that need the monkeypatch indirection.
  • _writer.py line count: 1800 to 1269 (-531). Inside the issue's ~1300 target.
  • Module docstring follows _decode.py's style and calls out the symmetry with PR-G.

Checklist

  • Byte parity preserved (no algorithmic changes)
  • Public import paths preserved (xrspatial.geotiff._writer.<name> resolves for every moved name)
  • Monkeypatch contract preserved (test_gil_friendly_kwarg_1830, test_streaming_write_parallel, test_write_layout_monkeypatch_contract_2248 all still pass)
  • Full geotiff test suite passes (5037 passed; one lz4 failure is pre-existing on main, not caused by this PR)
  • No new public API
  • No README / docs touch needed (private refactor)

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review (second pass): Refactor GeoTIFF Phase 5f: extract _encode.py from _writer.py

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None remaining. Prior pass's docstring suggestion was applied at _encode.py:23-32.

Nits (optional improvements)

None remaining. Prior pass's cross-reference nit was applied at _encode.py:365-370 (strip path) and _encode.py:511-518 (tile path, with the full test node-id).

What looks good

  • Follow-up commit is documentation-only; behaviour unchanged.
  • Module docstring now names _write_stripped and _write_tiled as the two lazy-lookup sites instead of pointing at the three prepare/compress leaves.
  • Both lazy-lookup comments now name the pinning test by id, so a future refactor that inlines the indirection will know which test will catch it.
  • Full geotiff test suite still passes the way it did on the first commit (5011 passed excluding the pre-existing lz4 byte-parity failure on main).

Checklist

  • Byte parity preserved
  • Public import paths preserved
  • Monkeypatch contracts preserved (test_gil_friendly_kwarg_1830, test_streaming_write_parallel, test_write_layout_monkeypatch_contract_2248)
  • Full geotiff test suite passes (modulo the pre-existing lz4 failure)
  • No new public API
  • Documentation matches actual call sites

Clean. No further changes requested.

Move the strip / tile encode helpers, the photometric resolution and
MinIsWhite inversion helpers, the predictor normalisation, the
``_compression_tag`` mapping, and the streaming ``_compress_block``
helper out of ``_writer.py`` into a new ``_encode.py``. Mirrors the
``_decode.py`` extraction on the read side (PR-G).

``_writer.py`` re-exports every moved name so the existing public
import path (``xrspatial.geotiff._writer.<name>``) and downstream
backends (``_writers/eager.py``, ``_writers/gpu.py``, ``_writers/vrt.py``,
``_write_layout.py``, the test suite) keep working without churn.

``_write_tiled`` and ``_write_stripped`` look up ``_prepare_tile`` and
``_prepare_strip`` through ``_writer`` so the monkeypatch contract pinned
by ``test_gil_friendly_kwarg_1830`` survives the move.

``_writer.py`` drops from 1800 to 1269 lines.

Part of #2211.
…2260)

Tighten the module docstring to point at the actual call sites where
the lazy _writer import lives (_write_stripped /
_write_tiled), and cross-reference the test that pins the
monkeypatch contract from the inline comments next to those lookups.

Pure documentation; no behaviour change.
@brendancol brendancol force-pushed the issue-2260-encode-extraction branch from d913b1d to 95fe6a8 Compare May 21, 2026 20:04
@brendancol brendancol merged commit f3cc785 into main May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor GeoTIFF Phase 5f: extract _encode.py from _writer.py (PR-L of #2211)

1 participant