Skip to content

Centralise VRT capability validator at both read entry points (#2371)#2376

Open
brendancol wants to merge 4 commits into
xarray-contrib:mainfrom
brendancol:issue-2371
Open

Centralise VRT capability validator at both read entry points (#2371)#2376
brendancol wants to merge 4 commits into
xarray-contrib:mainfrom
brendancol:issue-2371

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2371.

Summary

  • Extend validate_parsed_vrt to reject nested VRTs, per-source <UseMaskBand> flags, and per-source <MaskBand> children with VRTUnsupportedError.
  • Extend parse_vrt to reject dataset-level and band-level <GDALWarpOptions> blocks with UnsupportedGeoTIFFFeatureError.
  • Wire the internal _vrt.read_vrt entry point through validate_parsed_vrt so direct callers and the chunked dask path get the same capability rejections as the public _backends/vrt.read_vrt entry point.
  • Expose validate_vrt_capability as a public alias on _vrt_validation matching the epic naming.

Backend coverage

The validator is a metadata gate that runs before any backend dispatch, so numpy, cupy, dask+numpy, and dask+cupy all see the same rejections at graph-build / eager-read setup time.

Test plan

  • New file xrspatial/geotiff/tests/test_vrt_capability_validator_2371.py (22 tests) covers every new rejection path at four entry points: validator-direct, internal _vrt.read_vrt, public _backends/vrt.read_vrt, and open_geotiff.
  • test_vrt_resample_alg_1751.py accepts VRTUnsupportedError alongside the historical NotImplementedError. Both encode the same contract.
  • Full xrspatial/geotiff/tests/ suite passes (5661 passed).

Sub-task of epic #2342. Sibling PRs in flight: positive simple-mosaic tests, negative tests for unsupported VRT features, missing-source policy tests, and VRT contract docs.

…-contrib#2371)

Extend ``validate_parsed_vrt`` to cover nested VRT references, per-source
mask-band semantics, and dataset / band-level ``<GDALWarpOptions>``
blocks. Route the internal ``_vrt.read_vrt`` entry point through the
validator so direct callers and the chunked dask path get the same
capability rejections as ``_backends/vrt.read_vrt``.

Expose ``validate_vrt_capability`` as a public alias on
``_vrt_validation`` matching the epic naming (the implementation
continues under ``validate_parsed_vrt`` for backward compatibility).

Update the legacy issue xarray-contrib#1751 resample-alg regression tests to accept
the validator's ``VRTUnsupportedError`` alongside the original
``NotImplementedError``; both encode the same contract that nearest
must not be silently substituted for an unsupported algorithm.
Cover nested VRT, dataset-level and band-level <GDALWarpOptions>,
per-source <UseMaskBand> and <MaskBand> children, and the now-typed
resample-alg rejection at the internal entry point. Each rejection is
exercised at validator-direct, public _backends/vrt.read_vrt, internal
_vrt.read_vrt, and open_geotiff entry points so the wiring is covered.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 25, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Centralise VRT capability validator at both read entry points (#2371)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_vrt.py:713: the <UseMaskBand> truthy set is ('1', 'true', 'yes'). GDAL writes lowercase true, so '1' and 'yes' are spellings the parser will never see from GDAL itself. Tightening to ('1', 'true') keeps the surface narrow and matches what real VRTs contain. Not blocking; the wider set is harmless.

Nits (optional improvements)

  • xrspatial/geotiff/_vrt_validation.py:303: the nested-VRT message ends with "are not a supported feature in this release". The parse_vrt rejections point readers at xrspatial.geotiff.SUPPORTED_FEATURES. Adding the same pointer here keeps the rejection messages aligned on one discovery anchor.
  • xrspatial/geotiff/_vrt.py:1247: the validator import is lazy inside the function body. That matches the existing pattern elsewhere in the file, but a one-line comment ("lazy to avoid circular import") would help the next reader.
  • xrspatial/geotiff/tests/test_vrt_capability_validator_2371.py:339: the test_use_mask_band_truthy_spellings_rejected parametrize list includes 'yes'. If you tighten the truthy set per the suggestion above, drop that row or move it to a "must be accepted as not-truthy" test.

What looks good

  • The validator extends cleanly. Every new rejection names the offending source path and field, so the error message tells the caller where to look.
  • validate_vrt_capability as an alias keeps every existing call site working.
  • Test coverage is strong: every new rejection path runs at four entry points (validator-direct, internal _vrt.read_vrt, public _backends/vrt.read_vrt, open_geotiff).
  • The test_vrt_resample_alg_1751.py fix accepts both NotImplementedError and VRTUnsupportedError, so the legacy contract intent is preserved without breaking the new routing.
  • The parsed is None short-circuit in _vrt.read_vrt avoids double-validation when the chunked path threads a pre-validated instance in.
  • <GDALWarpOptions> rejection covers dataset-level, band-level, and the existing subClass="VRTWarpedRasterBand" marker. All three locations a warp config can appear are caught.

Checklist

  • Algorithm matches the VRT spec
  • Implemented backends produce consistent results (validator runs before backend dispatch)
  • NaN handling not applicable (metadata-only gate)
  • Edge cases covered: uppercase .VRT, false flag accepted, truthy spellings, both source tag types
  • Dask chunk boundaries handled correctly (chunked path threads a pre-validated VRT)
  • No premature materialization (metadata gate runs before any decode)
  • Benchmark not needed (cheap XML-only check)
  • README feature matrix not applicable (no new public API)
  • Docstrings present and accurate

…n SUPPORTED_FEATURES (xarray-contrib#2371)

- Narrow <UseMaskBand> truthy set to ('1', 'true') to match
  what GDAL actually emits. Tokens like yes / on / Y now
  pass through as not-mask rather than tripping the rejection.
- Anchor the nested-VRT rejection message on
  xrspatial.geotiff.SUPPORTED_FEATURES so the new path matches
  the parse_vrt rejection style.
- Add a one-line comment explaining why the validator import in
  _vrt.read_vrt is lazy (circular import with _vrt_validation).
- Replace the original yes parametrize row with a separate
  non_canonical_truthy_accepted test so the contract for outside
  the canonical set is explicit.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after addressing prior comments

Disposition of original findings

  • Suggestion (_vrt.py:713 UseMaskBand truthy set): Fixed. Narrowed to ('1', 'true'). Added test_use_mask_band_non_canonical_truthy_accepted to lock in the contract for non-canonical tokens.
  • Nit 1 (_vrt_validation.py:303 SUPPORTED_FEATURES anchor): Fixed. Nested-VRT message now references xrspatial.geotiff.SUPPORTED_FEATURES.
  • Nit 2 (lazy import comment): Fixed. Added a one-line note explaining the circular-import avoidance.
  • Nit 3 (drop 'yes' parametrize row): Fixed. Row removed; non-canonical tokens are covered by the new explicit test instead.

Verification

  • Full geotiff test suite: 5663 passed, 68 skipped, 4 xfailed, 1 xpassed.
  • New test count: 24 (was 22). The two new tests cover the narrowed truthy set.

No further findings.

…ntrib#2371)

Switch the nested-VRT rejection from {src.filename!r} to direct
interpolation so Windows paths render with single backslashes instead
of the doubled escapes repr emits, matching the path-containment
pattern in parse_vrt. Update the test assertion to compare on the
file basename so any further normalisation difference (short-name vs
long-name, symlink resolution) between str(tmp_path / name) and
the os.path.realpath form parse_vrt stores does not break the
match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VRT capability validator: centralize post-parse pre-read gate (#2342 work item)

1 participant