Skip to content

Commit 5556e37

Browse files
authored
Resolve CRS model type via pyproj instead of EPSG number range (#2277) (#2280)
* Fix CRS model type guess from EPSG number range (#2277) The GeoTIFF writer decided GeographicTypeGeoKey vs ProjectedCSTypeGeoKey from the EPSG number range (4326 plus 4000-4999). Anything else got written as projected. That silently mis-tagged geographic CRSes registered outside the legacy block -- 6318 (NAD83(2011)), 7844 (GDA2020), 9057 (WGS 84 (G2139)), 8252 (NAD83(FBN)), etc. -- which corrupts the CRS at write time. The writer now consults pyproj.CRS.from_epsg(...).is_geographic when pyproj is available. When pyproj is not installed and the code falls outside the hard-coded fallback set (4326 + 4000-4999), the writer raises UnknownCRSModelTypeError instead of guessing. Silent CRS corruption is worse than an explicit error at write time. New typed error UnknownCRSModelTypeError lives next to the existing GeoTIFFAmbiguousMetadataError family and is exported from xrspatial.geotiff. Tests cover the codes that previously regressed (6318, 7844, 9057, 8252), the legacy-range codes (4326, 4269, 4267), real projected codes (32610, 3857, 2193), the pyproj-missing fail-closed path, and the pyproj-installed-but-unknown-code path. * Address review: tighten EPSG fallback, CHANGELOG, pyproj extra (#2277) - Narrow `_KNOWN_GEOGRAPHIC_EPSG_FALLBACK` from the full 4000-4999 block to a vetted allowlist (4326, 4269, 4267, 4258, 4283, 4322, 4230, 4019, 4047). The 4000-4999 range contained projected codes (4087/4088 World Equidistant Cylindrical, 4499 CGCS2000 GK 21) that the old fallback would have mis-tagged as geographic, the exact dual of the bug this PR is fixing. Without pyproj, classify only what we can manually verify. - Add EPSG 4087/4088/4499 regression test so the projected-inside- legacy-range case stays covered. - Drop the noise lines from the new tests: the fragile `flat.index` assertion and the dummy `assert _geotags is not None`. - Refresh the helper's docstring / inline comment to mention both fallback paths (pyproj-missing AND pyproj-installed-but-DB-failed). - Append the new `UnknownCRSModelTypeError` to the public-API contract test in test_features.py so it stays in `__all__`. - Add `pyproj` to the `geotiff` extra in setup.cfg so installing `xarray-spatial[geotiff]` picks up the dependency the writer now prefers. - CHANGELOG entry under Unreleased calls out the behaviour change.
1 parent 6261ca4 commit 5556e37

7 files changed

Lines changed: 261 additions & 6 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
### Unreleased
66

77
#### Bug fixes and improvements
8+
- Resolve the GeoTIFF writer's `GeographicTypeGeoKey` / `ProjectedCSTypeGeoKey` decision via pyproj instead of an EPSG number range. The legacy heuristic (4326 + 4000-4999 -> geographic, else projected) silently mis-tagged geographic CRSes registered outside the 4000-4999 block (NAD83(2011) = 6318, GDA2020 = 7844, WGS 84 (G2139) = 9057, etc.) as projected and projected codes inside the block (4087 / 4088 / 4499) as geographic, corrupting the CRS at write time. The writer now calls `pyproj.CRS.from_epsg(...).is_geographic`. When pyproj can't classify a code (uninstalled, or installed but the local PROJ database lacks the entry), the writer raises the new `UnknownCRSModelTypeError` rather than guessing -- a small vetted allowlist (4326, 4269, 4267, 4258, 4283, 4322, 4230, 4019, 4047) is still honoured for the pyproj-missing case. `pyproj` is now listed under the `geotiff` extra. (#2277)
89
- Shut down the per-tile compression `ThreadPoolExecutor` on every exit path of the streaming tiled-write code in `to_geotiff`. The old code only called `shutdown(wait=True)` after the tile-row loop completed, so any mid-stream raise (compression failure, dask compute failure, file write failure) bypassed shutdown and leaked worker threads. The loop now runs inside `try/finally` and the finally calls `shutdown(wait=True, cancel_futures=True)` so queued tiles get dropped on the error path instead of blocking the unwind. The pool's workers carry an `xrspatial-geotiff-tile-compress` `thread_name_prefix` so leak-detection tests can tell them apart from dask's own offload/scheduler pools. (#2276)
910
- Remove read-side emission of the 13 deprecated GeoTIFF attrs (`crs_name`, `geog_citation`, `datum_code`, `angular_units`, `semi_major_axis`, `inv_flattening`, `linear_units`, `projection_code`, `vertical_crs`, `vertical_citation`, `vertical_units`, `colormap_rgba`, `cmap`) and bump `attrs['_xrspatial_geotiff_contract']` from 1 to 2. Downstream code that read these via `attrs[key]` now sees `KeyError`; migrate to `attrs.get(key)` or derive the value from `attrs['crs']` / `attrs['crs_wkt']` with pyproj. The `.xrs.plot()` accessor still surfaces palette colormaps by building a `ListedColormap` from the canonical `attrs['colormap']`. (#2016)
1011
- Accept numpy integer scalars as the `crs=` argument to `to_geotiff` / `write_geotiff_gpu`. The validator already allowed `numbers.Integral`, but the writers gated EPSG assignment on `isinstance(crs, int)`, so `np.int32` / `np.int64` / `np.uint16` values passed validation then silently fell through with no EPSG written. (#2082)

setup.cfg

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@ reproject =
6464
pyproj
6565
geotiff =
6666
deflate
67+
# Required by the GeoTIFF writer to classify EPSG codes as
68+
# geographic vs projected. Without pyproj the writer can only
69+
# handle a tiny hard-coded allowlist of EPSGs and raises
70+
# UnknownCRSModelTypeError on anything else. See issue #2277.
71+
pyproj
6772
dask =
6873
dask[array]
6974
dask-geopandas

xrspatial/geotiff/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
MixedBandMetadataError,
6262
NonUniformCoordsError,
6363
RotatedTransformError,
64+
UnknownCRSModelTypeError,
6465
UnparseableCRSError,
6566
)
6667
from ._geotags import GeoTransform, RASTER_PIXEL_IS_AREA, RASTER_PIXEL_IS_POINT
@@ -140,6 +141,7 @@
140141
'NonUniformCoordsError',
141142
'RotatedTransformError',
142143
'SUPPORTED_FEATURES',
144+
'UnknownCRSModelTypeError',
143145
'UnparseableCRSError',
144146
'UnsafeURLError',
145147
'open_geotiff',

xrspatial/geotiff/_errors.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,21 @@ class ConflictingNodataError(GeoTIFFAmbiguousMetadataError):
103103
"""
104104

105105

106+
class UnknownCRSModelTypeError(GeoTIFFAmbiguousMetadataError):
107+
"""Can't classify an EPSG as geographic or projected on write (#2277).
108+
109+
Raised by the GeoTIFF writer when the caller supplies an EPSG code
110+
that pyproj cannot resolve, or when pyproj isn't installed and the
111+
code falls outside the hard-coded geographic fallback set
112+
(EPSG 4326 plus the 4000-4999 block). The legacy heuristic at the
113+
same site guessed any code outside that window was projected, which
114+
silently mis-tagged geographic codes like 6318 (NAD83(2011)),
115+
7844 (GDA2020), and 9057 (WGS 84 (G2139)) as
116+
``ProjectedCSTypeGeoKey``. Silent CRS corruption is worse than an
117+
explicit error.
118+
"""
119+
120+
106121
__all__ = [
107122
"GeoTIFFAmbiguousMetadataError",
108123
"InvalidCRSCodeError",
@@ -112,4 +127,5 @@ class ConflictingNodataError(GeoTIFFAmbiguousMetadataError):
112127
"MixedBandMetadataError",
113128
"ConflictingCRSError",
114129
"ConflictingNodataError",
130+
"UnknownCRSModelTypeError",
115131
]

xrspatial/geotiff/_geotags.py

Lines changed: 95 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
TAG_GEO_KEY_DIRECTORY, TAG_GEO_DOUBLE_PARAMS, TAG_GEO_ASCII_PARAMS,
3535
)
3636
from ._dtypes import resolve_bits_per_sample
37-
from ._errors import RotatedTransformError
37+
from ._errors import RotatedTransformError, UnknownCRSModelTypeError
3838

3939
# ImageDescription tag (270). Captured for round-trip but not managed
4040
# by the writer -- it flows through extra_tags pass-through.
@@ -1317,6 +1317,92 @@ def _model_type_from_wkt(wkt: str) -> int:
13171317
return MODEL_TYPE_PROJECTED
13181318

13191319

1320+
# Vetted allowlist of EPSG codes known to be geographic. Consulted in
1321+
# two situations: when pyproj is unavailable at all, and when pyproj is
1322+
# installed but its database can't resolve the code (e.g. out-of-date
1323+
# PROJ db). Anything outside this set falls through to
1324+
# UnknownCRSModelTypeError -- silent CRS corruption is worse than an
1325+
# explicit error.
1326+
#
1327+
# The set is kept intentionally small. The original range heuristic
1328+
# (4000-4999) was wrong in both directions: it missed geographic codes
1329+
# outside the window (NAD83(2011) = 6318, GDA2020 = 7844, etc., the bug
1330+
# this PR fixes), AND it also covered projected codes inside the window
1331+
# (4087 / 4088 World Equidistant Cylindrical, 4499 CGCS2000 / GK
1332+
# zone 21, etc.). Expanding the set without pyproj's authoritative
1333+
# answer just trades one mis-tag for another.
1334+
#
1335+
# Currently: the few datums callers regularly write without pyproj.
1336+
# Extend only after verifying ``CRS.from_epsg(code).is_geographic`` is
1337+
# True. See issue #2277.
1338+
_KNOWN_GEOGRAPHIC_EPSG_FALLBACK = frozenset({
1339+
4326, # WGS 84
1340+
4269, # NAD83
1341+
4267, # NAD27
1342+
4258, # ETRS89
1343+
4283, # GDA94
1344+
4322, # WGS 72
1345+
4230, # ED50
1346+
4019, # Unknown datum based on GRS 1980 ellipsoid
1347+
4047, # Unspecified datum based on GRS 1980 Authalic Sphere
1348+
})
1349+
1350+
1351+
def _model_type_from_epsg(crs_epsg: int) -> int:
1352+
"""Return the GeoTIFF ModelType (geographic vs projected) for an EPSG.
1353+
1354+
Prefers ``pyproj.CRS.from_epsg(crs_epsg).is_geographic`` so the
1355+
decision matches the actual EPSG registry. Falls back to a small
1356+
vetted allowlist of common geographic datums
1357+
(:data:`_KNOWN_GEOGRAPHIC_EPSG_FALLBACK`) when pyproj isn't
1358+
installed *or* when pyproj is installed but fails to resolve the
1359+
code. Anything outside that allowlist raises
1360+
:class:`UnknownCRSModelTypeError` rather than guessing -- the legacy
1361+
range heuristic at this site silently mis-tagged geographic codes
1362+
like 6318 (NAD83(2011)) and 7844 (GDA2020) as projected.
1363+
1364+
See issue #2277.
1365+
"""
1366+
try:
1367+
from pyproj import CRS
1368+
except ImportError:
1369+
CRS = None # type: ignore[assignment]
1370+
1371+
if CRS is not None:
1372+
try:
1373+
crs = CRS.from_epsg(crs_epsg)
1374+
except Exception as e:
1375+
# pyproj is installed but the code is unknown or malformed.
1376+
# Fall back to the hard-coded allowlist before giving up so
1377+
# an offline pyproj-database mismatch doesn't break a code
1378+
# that the legacy heuristic handled correctly.
1379+
if crs_epsg in _KNOWN_GEOGRAPHIC_EPSG_FALLBACK:
1380+
return MODEL_TYPE_GEOGRAPHIC
1381+
raise UnknownCRSModelTypeError(
1382+
f"Cannot determine GeoTIFF model type for EPSG:{crs_epsg}: "
1383+
f"pyproj.CRS.from_epsg failed "
1384+
f"({type(e).__name__}: {e}). Refusing to guess; passing a "
1385+
"known EPSG or installing a current pyproj database will "
1386+
"resolve this."
1387+
) from e
1388+
if crs.is_geographic:
1389+
return MODEL_TYPE_GEOGRAPHIC
1390+
return MODEL_TYPE_PROJECTED
1391+
1392+
# pyproj unavailable. Use the tight fallback set; raise otherwise.
1393+
if crs_epsg in _KNOWN_GEOGRAPHIC_EPSG_FALLBACK:
1394+
return MODEL_TYPE_GEOGRAPHIC
1395+
raise UnknownCRSModelTypeError(
1396+
f"Cannot determine GeoTIFF model type for EPSG:{crs_epsg} without "
1397+
"pyproj. Install pyproj so the writer can consult "
1398+
"CRS.from_epsg(...).is_geographic, or pass an EPSG in the "
1399+
"hard-coded geographic fallback set "
1400+
f"({sorted(_KNOWN_GEOGRAPHIC_EPSG_FALLBACK)}). Refusing to "
1401+
"guess: the legacy range heuristic silently mis-tagged codes "
1402+
"like 6318 and 7844 as projected. See issue #2277."
1403+
)
1404+
1405+
13201406
def build_geo_tags(transform: GeoTransform, crs_epsg: int | None = None,
13211407
nodata=None,
13221408
raster_type: int = RASTER_PIXEL_IS_AREA,
@@ -1402,11 +1488,14 @@ def build_geo_tags(transform: GeoTransform, crs_epsg: int | None = None,
14021488

14031489
# ModelType
14041490
if crs_epsg is not None:
1405-
# Guess model type from EPSG (simple heuristic)
1406-
if crs_epsg == 4326 or (crs_epsg >= 4000 and crs_epsg < 5000):
1407-
model_type = MODEL_TYPE_GEOGRAPHIC
1408-
else:
1409-
model_type = MODEL_TYPE_PROJECTED
1491+
# Resolve the GeoTIFF ModelType via pyproj when available; raise
1492+
# on unknown codes when pyproj can't classify them rather than
1493+
# guessing. See issue #2277 -- the historic EPSG-range heuristic
1494+
# silently mis-tagged geographic codes outside 4000-4999 (e.g.
1495+
# 6318, 7844, 9057) as projected AND projected codes inside
1496+
# 4000-4999 (e.g. 4087, 4088, 4499) as geographic, corrupting
1497+
# the CRS at write time.
1498+
model_type = _model_type_from_epsg(crs_epsg)
14101499
key_entries.append((GEOKEY_MODEL_TYPE, 0, 1, model_type))
14111500
num_keys += 1
14121501
elif crs_wkt is not None:

xrspatial/geotiff/tests/test_features.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2780,6 +2780,7 @@ def test_all_lists_supported_functions(self):
27802780
'MixedBandMetadataError',
27812781
'NonUniformCoordsError',
27822782
'RotatedTransformError',
2783+
'UnknownCRSModelTypeError',
27832784
'UnparseableCRSError',
27842785
'GeoTIFFFallbackWarning',
27852786
'UnsafeURLError',

xrspatial/geotiff/tests/test_geotags.py

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,147 @@ def test_projected_crs_geokey(self):
110110
assert 3072 in geokeys # GEOKEY_PROJECTED_CS_TYPE
111111

112112

113+
class TestModelTypeFromEPSG:
114+
"""Regression coverage for issue #2277.
115+
116+
The legacy heuristic at build_geo_tags decided geographic-vs-projected
117+
from the EPSG number range (4326 plus 4000-4999). That silently
118+
mis-tagged geographic CRSes registered outside the 4000-4999 block
119+
(NAD83(2011) = 6318, GDA2020 = 7844, WGS 84 (G2139) = 9057, etc.) as
120+
projected, AND mis-tagged projected codes inside the block (4087 /
121+
4088 / 4499 etc.) as geographic. Both directions corrupt the CRS at
122+
write time.
123+
124+
The fix consults pyproj when available; when pyproj can't classify
125+
a code (uninstalled, or installed but DB lookup fails), anything
126+
outside a small vetted geographic allowlist raises rather than
127+
guessing.
128+
"""
129+
130+
@pytest.mark.parametrize("epsg", [4326, 4269, 4267, 4258, 4283])
131+
def test_geographic_allowlist(self, epsg):
132+
# Codes the hard-coded fallback set covers explicitly.
133+
gt = GeoTransform(-120.0, 45.0, 0.001, -0.001)
134+
tags = build_geo_tags(gt, crs_epsg=epsg)
135+
geokeys = tags[TAG_GEO_KEY_DIRECTORY]
136+
assert GEOKEY_GEOGRAPHIC_TYPE in geokeys
137+
assert GEOKEY_PROJECTED_CS_TYPE not in geokeys
138+
139+
@pytest.mark.parametrize("epsg", [6318, 7844, 9057, 8252])
140+
def test_geographic_outside_legacy_range(self, epsg):
141+
# The bug: pre-fix, these wrote ProjectedCSTypeGeoKey.
142+
# pyproj.CRS.from_epsg(...).is_geographic is True for all four.
143+
pytest.importorskip("pyproj")
144+
gt = GeoTransform(-120.0, 45.0, 0.001, -0.001)
145+
tags = build_geo_tags(gt, crs_epsg=epsg)
146+
geokeys = tags[TAG_GEO_KEY_DIRECTORY]
147+
assert GEOKEY_GEOGRAPHIC_TYPE in geokeys, (
148+
f"EPSG:{epsg} is geographic per pyproj but was written as "
149+
f"projected -- regression of issue #2277"
150+
)
151+
assert GEOKEY_PROJECTED_CS_TYPE not in geokeys
152+
153+
@pytest.mark.parametrize("epsg", [4087, 4088, 4499])
154+
def test_projected_inside_legacy_range(self, epsg):
155+
# The other direction of the legacy bug: codes in 4000-4999 that
156+
# are actually projected (World Equidistant Cylindrical at 4087 /
157+
# 4088, CGCS2000 Gauss-Kruger zone 21 at 4499). The fix routes
158+
# these through pyproj, which classifies them correctly.
159+
pytest.importorskip("pyproj")
160+
gt = GeoTransform(500000.0, 4500000.0, 30.0, -30.0)
161+
tags = build_geo_tags(gt, crs_epsg=epsg)
162+
geokeys = tags[TAG_GEO_KEY_DIRECTORY]
163+
assert GEOKEY_PROJECTED_CS_TYPE in geokeys, (
164+
f"EPSG:{epsg} is projected per pyproj but was written as "
165+
f"geographic -- regression of issue #2277"
166+
)
167+
assert GEOKEY_GEOGRAPHIC_TYPE not in geokeys
168+
169+
@pytest.mark.parametrize("epsg", [32610, 3857, 2193])
170+
def test_projected_dispatch(self, epsg):
171+
pytest.importorskip("pyproj")
172+
gt = GeoTransform(500000.0, 4500000.0, 30.0, -30.0)
173+
tags = build_geo_tags(gt, crs_epsg=epsg)
174+
geokeys = tags[TAG_GEO_KEY_DIRECTORY]
175+
assert GEOKEY_PROJECTED_CS_TYPE in geokeys
176+
assert GEOKEY_GEOGRAPHIC_TYPE not in geokeys
177+
178+
def test_unknown_epsg_without_pyproj_raises(self, monkeypatch):
179+
"""Fail closed when pyproj is unavailable and the code is unknown.
180+
181+
Mocks `pyproj` import failure to exercise the fallback path.
182+
EPSG 7844 (GDA2020 geographic) is outside the hard-coded
183+
allowlist, so without pyproj the writer cannot tell geographic
184+
from projected and must raise -- silent corruption is worse
185+
than an explicit error.
186+
"""
187+
import builtins
188+
from xrspatial.geotiff._errors import UnknownCRSModelTypeError
189+
190+
real_import = builtins.__import__
191+
192+
def fake_import(name, *args, **kwargs):
193+
if name == "pyproj" or name.startswith("pyproj."):
194+
raise ImportError("simulated: pyproj not installed")
195+
return real_import(name, *args, **kwargs)
196+
197+
monkeypatch.setattr(builtins, "__import__", fake_import)
198+
199+
gt = GeoTransform(-120.0, 45.0, 0.001, -0.001)
200+
with pytest.raises(UnknownCRSModelTypeError, match="pyproj"):
201+
build_geo_tags(gt, crs_epsg=7844)
202+
203+
def test_known_geographic_without_pyproj_still_works(self, monkeypatch):
204+
"""The hard-coded allowlist keeps working when pyproj is gone.
205+
206+
EPSG 4326 (and the rest of the vetted allowlist) lets common
207+
geographic rasters round-trip without a pyproj dependency.
208+
"""
209+
import builtins
210+
211+
real_import = builtins.__import__
212+
213+
def fake_import(name, *args, **kwargs):
214+
if name == "pyproj" or name.startswith("pyproj."):
215+
raise ImportError("simulated: pyproj not installed")
216+
return real_import(name, *args, **kwargs)
217+
218+
monkeypatch.setattr(builtins, "__import__", fake_import)
219+
220+
gt = GeoTransform(-120.0, 45.0, 0.001, -0.001)
221+
tags = build_geo_tags(gt, crs_epsg=4326)
222+
geokeys = tags[TAG_GEO_KEY_DIRECTORY]
223+
assert GEOKEY_GEOGRAPHIC_TYPE in geokeys
224+
assert GEOKEY_PROJECTED_CS_TYPE not in geokeys
225+
226+
def test_unknown_epsg_with_pyproj_raises(self, monkeypatch):
227+
"""pyproj raises on an unknown EPSG -> writer raises too.
228+
229+
Simulates the case where pyproj is installed but the EPSG code
230+
isn't in its database (e.g. the local PROJ database is out of
231+
date). The fallback set covers a small vetted allowlist;
232+
anything else should raise rather than guess.
233+
"""
234+
from xrspatial.geotiff._errors import UnknownCRSModelTypeError
235+
236+
# Patch CRS.from_epsg to simulate a missing DB entry.
237+
import pyproj
238+
239+
def boom(code):
240+
raise pyproj.exceptions.CRSError(f"simulated unknown code {code}")
241+
242+
monkeypatch.setattr(pyproj.CRS, "from_epsg", staticmethod(boom))
243+
244+
gt = GeoTransform(0.0, 0.0, 30.0, -30.0)
245+
# 9999999 is outside the hard-coded allowlist.
246+
with pytest.raises(UnknownCRSModelTypeError):
247+
build_geo_tags(gt, crs_epsg=9999999)
248+
249+
# And the allowlist still rescues EPSG 4326 even when pyproj fails.
250+
tags = build_geo_tags(gt, crs_epsg=4326)
251+
assert GEOKEY_GEOGRAPHIC_TYPE in tags[TAG_GEO_KEY_DIRECTORY]
252+
253+
113254
def _build_tiff_with_transformation_tag(matrix_16: tuple) -> bytes:
114255
"""Build a tiny single-strip TIFF carrying a 4x4 ModelTransformationTag.
115256

0 commit comments

Comments
 (0)