Skip to content

Commit 488f58a

Browse files
authored
Refactor GeoTIFF Phase 5d: extract _cog_http.py from _reader.py (#2262)
* Refactor GeoTIFF Phase 5d: extract _cog_http.py from _reader.py (#2258) Move the COG-over-HTTP transport layer (`_parse_cog_http_meta`, `_read_cog_http`, `_fetch_decode_cog_http_strips`, `_fetch_decode_cog_http_tiles`) plus the `INITIAL_HTTP_HEADER_BYTES` / `MAX_HTTP_HEADER_BYTES` constants into a new `_cog_http.py`. `_reader.py` drops from 1258 to 429 lines. The public import surface stays intact: `_reader.py` re-exports the moved names so `from xrspatial.geotiff._reader import _read_cog_http` and friends keep working unchanged. `_read_cog_http` resolves `_HTTPSource`, `_parse_cog_http_meta`, `_fetch_decode_cog_http_tiles`, `_decode_strip_or_tile`, and `_apply_photometric_miniswhite` through the `_reader` module namespace at call time so existing tests that `monkeypatch.setattr(_reader, '_HTTPSource', ...)` (and friends) continue to intercept the collaborators. `_parse_cog_http_meta` reads `MAX_HTTP_HEADER_BYTES` and `INITIAL_HTTP_HEADER_BYTES` the same way so `test_http_meta_buffer_1718`'s cap-shrinking patch keeps working. Part of #2211. * Address PR review nits: TYPE_CHECKING import, comments (#2258) * Move `_HTTPSource` to `TYPE_CHECKING` in `_cog_http.py` so the only module-level binding of the name is a type annotation; the runtime collaborator is always `_reader._HTTPSource`. * Drop direct `_apply_photometric_miniswhite` / `_decode_strip_or_tile` imports from `_cog_http.py` since both are looked up through `_reader` at call time. Comment in the import block explains which names from `_decode` are still used directly. * Add a module-level note about why `from . import _reader` is lazy (circular load order) and an index of every name looked up through the `_reader` namespace for monkeypatch reasons. * Add a comment in `_parse_cog_http_meta` explaining the one-shot capture of `INITIAL_HTTP_HEADER_BYTES` / `MAX_HTTP_HEADER_BYTES` is behaviourally equivalent to the original per-iteration global read (the constants are never mutated mid-call). * Add a comment in `__init__.py` explaining why this site imports `_parse_cog_http_meta` from `_cog_http` directly while `_backends/dask.py` keeps `_HTTPSource` going through `_reader`. * Restore module-level _HTTPSource import for type-hint introspection (#2258) The previous nit-fix commit moved `_HTTPSource` under TYPE_CHECKING, which broke `typing.get_type_hints()` against `_parse_cog_http_meta`, `_fetch_decode_cog_http_strips`, and `_fetch_decode_cog_http_tiles`. The function objects are also re-exported through `_reader`, so the breakage propagated to `_reader._parse_cog_http_meta` etc. as well. Nothing in the codebase calls `get_type_hints` on these helpers today, but the original `_reader.py` carried a real `_HTTPSource` import, so removing it was a subtle behavioural regression rather than a cleanup. Restore the runtime import and replace the TYPE_CHECKING block with a comment that explains why the name lives in the imports but the live runtime collaborator is always `_reader._HTTPSource`.
1 parent 1bbb95f commit 488f58a

4 files changed

Lines changed: 1039 additions & 859 deletions

File tree

xrspatial/geotiff/__init__.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -213,9 +213,18 @@ def _read_geo_info(source, *, overview_level: int | None = None,
213213
from ._dtypes import resolve_bits_per_sample, tiff_dtype_to_numpy
214214
from ._geotags import extract_geo_info_with_overview_inheritance
215215
from ._header import parse_all_ifds, parse_header, select_overview_ifd
216-
from ._reader import (
216+
# ``_parse_cog_http_meta`` is imported from ``_cog_http`` directly
217+
# rather than re-routed through ``_reader`` because the
218+
# ``open_geotiff(..., chunks=...)`` fsspec metadata path is not part
219+
# of the ``_reader.*`` monkeypatch surface (no test patches
220+
# ``_reader._parse_cog_http_meta`` and then exercises this branch).
221+
# The eager / dask HTTP paths that ARE patched route through
222+
# ``_cog_http._read_cog_http`` and ``_backends/dask.py``'s
223+
# ``_HTTPSource`` construction, both of which still go through
224+
# ``_reader`` for the patchable names. See PR-J / #2258.
225+
from ._cog_http import _parse_cog_http_meta
226+
from ._sources import (
217227
_CloudSource, _coerce_path, _is_file_like, _is_fsspec_uri,
218-
_parse_cog_http_meta,
219228
)
220229
from ._validation import _validate_predictor_sample_format
221230

xrspatial/geotiff/_backends/dask.py

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,12 @@ def read_geotiff_dask(source: str, *,
187187
effective_source = source
188188
if is_http or is_fsspec:
189189
import dask
190-
from .._reader import _parse_cog_http_meta
190+
from .._cog_http import _parse_cog_http_meta
191191
if is_http:
192+
# ``_HTTPSource`` is resolved through ``_reader`` so existing
193+
# tests that ``monkeypatch.setattr(_reader, '_HTTPSource', ...)``
194+
# keep intercepting the construction after the COG-HTTP
195+
# helpers moved to ``_cog_http`` (PR-J / #2258).
192196
from .._reader import _HTTPSource
193197
_src = _HTTPSource(source)
194198
else:
@@ -548,13 +552,15 @@ def _read(http_meta):
548552
from .._reader import _is_fsspec_uri as _ifs
549553
_is_fsspec_src = _ifs(source)
550554
if http_meta is not None and (_is_http_src or _is_fsspec_src):
551-
from .._reader import (
552-
_fetch_decode_cog_http_tiles,
553-
MAX_PIXELS_DEFAULT,
554-
_apply_photometric_miniswhite,
555-
)
555+
from .._cog_http import _fetch_decode_cog_http_tiles
556+
from .._decode import _apply_photometric_miniswhite
557+
from .._layout import MAX_PIXELS_DEFAULT
556558
header, ifd = http_meta
557559
if _is_http_src:
560+
# See PR-J / #2258: keep the per-chunk ``_HTTPSource`` /
561+
# ``_CloudSource`` construction routed through ``_reader``
562+
# so monkeypatches against ``_reader._HTTPSource`` (used
563+
# by the dask-HTTP coalesce tests) still take effect.
558564
from .._reader import _HTTPSource
559565
src = _HTTPSource(source)
560566
else:

0 commit comments

Comments
 (0)