Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/reference/classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@
Classification
**************

.. warning::

Classification functions silently set NaN and infinite input values
to NaN in the output. Clean infinities before classifying if you
want every cell assigned to a bin.

Equal Interval
==============
.. autosummary::
Expand Down
12 changes: 12 additions & 0 deletions docs/source/reference/focal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@
Focal
*****

.. caution::

Focal operations use ``dask.array.map_overlap``. Each chunk dimension
must be **larger than the kernel depth** (typically 1 cell for a 3x3
kernel). Chunks smaller than the depth produce wrong results.

.. note::

Focal functions output **float32**. NaN handling varies: ``mean``
excludes NaN neighbours from the average, while ``hotspots`` propagates
NaN from any neighbour.

Apply
=====
.. autosummary::
Expand Down
7 changes: 7 additions & 0 deletions docs/source/reference/hydrology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
Hydrology
*********

.. warning::

NaN cells act as **impassable barriers** in all hydrology functions.
Flow cannot cross them. If your DEM has NaN holes (e.g. water bodies
masked out), fill or interpolate them first, or expect disconnected
drainage networks.

Flow Direction (D8)
===================
.. autosummary::
Expand Down
7 changes: 7 additions & 0 deletions docs/source/reference/multispectral.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
Multispectral
*************

.. note::

All spectral indices output **float32**. Division by zero (e.g.
NDVI where NIR + Red = 0) produces NaN or inf silently. Clean
the result with ``xr.where(np.isfinite(result), result, np.nan)``
if needed.

Atmospherically Resistant Vegetation Index (ARVI)
=================================================
.. autosummary::
Expand Down
11 changes: 11 additions & 0 deletions docs/source/reference/pathfinding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,17 @@
Pathfinding
***********

.. caution::

A* allocates about 65 bytes per pixel and will raise ``MemoryError``
if the required memory exceeds 80 % of available RAM. Use Dask or
set ``search_radius`` to limit the search area for large rasters.

.. warning::

NaN and non-positive friction values are treated as impassable
barriers. Cells must have finite positive friction to be traversable.

A* Pathfinding
==============
.. autosummary::
Expand Down
14 changes: 14 additions & 0 deletions docs/source/reference/proximity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@
Proximity
*********

.. warning::

``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
returns distances in the **coordinate units** of the DataArray.
With default integer indices this equals pixel counts. Use
``'GREAT_CIRCLE'`` for lat/lon data (returns metres).

.. caution::

With Dask, ``proximity()`` expands each chunk by ``max_distance`` cells.
If ``max_distance`` is infinite (the default), the whole array is loaded
into a single chunk. Set a finite ``max_distance`` to keep memory
bounded.

Allocation
==========
.. autosummary::
Expand Down
14 changes: 14 additions & 0 deletions docs/source/reference/surface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@
Surface
*******

.. danger::

``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
``method='geodesic'`` assume the **WGS84 ellipsoid** and require
coordinates in **degrees** (geographic CRS). Passing projected
coordinates (metres) to the geodesic method produces wrong results.
Use ``method='planar'`` (the default) for projected data.

.. note::

All surface functions output **float32** regardless of input dtype.
Edge cells within the 3x3 kernel radius are NaN by default
(``boundary='nan'``).

Aspect
======
.. autosummary::
Expand Down
6 changes: 6 additions & 0 deletions docs/source/reference/terrain_metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@
Terrain Metrics
***************

.. note::

Terrain metrics use a 3x3 neighbourhood and output **float64**. Edge
cells are NaN. These functions assume the input is on a regular grid
with uniform cell spacing.

Roughness
=========
.. autosummary::
Expand Down
5 changes: 5 additions & 0 deletions docs/source/reference/zonal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
Zonal
*****

.. note::

NaN values are excluded from all zonal aggregations. A zone where
every cell is NaN returns NaN (not zero) for sum, mean, etc.

Apply
=====
.. autosummary::
Expand Down
253 changes: 253 additions & 0 deletions docs/source/user_guide/caveats.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
.. _user_guide.caveats:

***********************
Caveats & Assumptions
***********************

xarray-spatial makes several assumptions about input data that can produce
wrong or confusing results if they aren't met. This page collects them in
one place. Each section maps to a Sphinx admonition level so you can gauge
severity at a glance:

.. danger:: marks assumptions that cause **silently wrong output**.

.. warning:: marks gotchas that **change the meaning** of results.

.. caution:: marks **performance or memory** pitfalls.

.. tip:: gives practical workarounds.

.. note:: provides context that clarifies non-obvious behaviour.


.. contents:: On this page
:local:
:depth: 1


Geodesic terrain functions assume WGS84
========================================

.. danger::

``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
``method='geodesic'`` use the WGS84 ellipsoid (a = 6 378 137 m,
b = 6 356 752 m). There is no parameter to select a different
ellipsoid.

If your elevation data references a different ellipsoid (e.g. Clarke 1866 or
Bessel 1841), the geodesic path will still run, but the ECEF projection and
curvature correction will be slightly off. For most modern data this
doesn't matter because WGS84 and GRS80 are functionally identical, but
older survey-quality datasets on local datums can differ by tens of metres
in the geoid.


Projected coordinates + geodesic = wrong results
=================================================

.. danger::

Passing a raster whose coordinates are in metres (UTM, state-plane, etc.)
to ``method='geodesic'`` will produce garbage. The geodesic path expects
latitude/longitude in degrees.

The library does validate that coordinate values fall within geographic
ranges (latitude in [-90, 90], longitude in [-180, 360]) and will raise a
``ValueError`` if they don't. But this check is not foolproof: coordinates
in small projected grids that happen to look like valid lat/lon values will
slip through.

.. tip::

Use ``method='planar'`` (the default) for projected CRS data. Use
``method='geodesic'`` only when your coordinates are in degrees on a
geographic CRS.


All output is float32
=====================

.. warning::

Every analytical function casts its output to ``float32`` regardless of
input dtype. If you pass ``float64`` elevation data, the result is
``float32``.

Float32 gives about 7 significant digits, which is more than enough for
slope, NDVI, and similar metrics. But if you're chaining many operations
or working with data that needs sub-centimetre precision (e.g. LiDAR
heights in a small range), the truncation can matter.

See :ref:`user_guide.data_types` for the full rationale.

.. tip::

If you need float64 output, cast after the function returns:

.. code-block:: python

result = slope(agg).astype('float64')


NaN is the nodata sentinel
==========================

.. warning::

xarray-spatial uses ``NaN`` as its nodata value everywhere. There is
no parameter to choose a different sentinel (e.g. -9999).

Functions generally **propagate** NaN: if any cell in a 3x3 kernel is NaN,
the output cell is NaN. The exact behaviour varies by function:

- **Slope, aspect, hillshade** -- any NaN neighbour produces a NaN output
cell.
- **Focal mean/std** -- NaN neighbours are excluded from the average; the
cell itself is still computed.
- **Zonal stats** -- NaN values are excluded from the aggregation. An
all-NaN zone returns NaN, not zero.
- **Hydrology (flow direction)** -- NaN cells are treated as impassable
barriers. Flow cannot cross them.
- **Pathfinding (A*)** -- NaN cells are impassable, same as barriers.

.. note::

Edge cells (within the kernel radius of the raster boundary) default to
NaN when ``boundary='nan'``. Use ``boundary='nearest'`` or
``boundary='reflect'`` if you need values at the edges.


Proximity distances depend on coordinate units
================================================

.. warning::

``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
computes distances in whatever units the DataArray's coordinates use.
If coordinates are default integer indices (0, 1, 2, ...), the result
is in pixel counts. If coordinates are in metres (UTM), the result is
in metres. If coordinates are in degrees, the result is in degrees.

Switch to ``distance_metric='GREAT_CIRCLE'`` for lat/lon data to get
distances in **metres** (the radius used is 6 378 137 m).

.. tip::

.. code-block:: python

from xrspatial.proximity import proximity

# EUCLIDEAN with default coords = pixel-count distances
dist_px = proximity(raster)

# GREAT_CIRCLE with lat/lon coords = distances in metres
dist_m = proximity(raster, distance_metric='GREAT_CIRCLE')


Great-circle distances assume sphere radius = semi-major axis
==============================================================

.. note::

Great-circle distances in ``proximity()`` and ``surface_distance()``
model the Earth as a sphere with radius = WGS84 semi-major axis
(6 378 137 m), not the mean radius (~6 371 km).

The difference is about 0.1 %. This is negligible for most use cases but
worth knowing if you're comparing against a reference that uses the mean
radius or the IUGG value.


Dask chunk sizes and ``map_overlap``
=====================================

.. caution::

Focal, slope, aspect, and other kernel-based operations use
``dask.array.map_overlap``. If any chunk dimension is **smaller than
the kernel depth** (typically 1 cell for a 3x3 kernel), the overlap
will fail or produce wrong results.

.. caution::

``proximity()`` with Dask expands each chunk by ``max_distance`` cells.
If ``max_distance`` is infinite (the default), the entire array is loaded
into one chunk. Set a finite ``max_distance`` to keep memory bounded.

.. tip::

A good rule of thumb for focal operations: keep chunks at least 64 cells
on each side. For proximity, set ``max_distance`` to the largest
distance you actually care about.

.. code-block:: python

import dask.array as da
import xarray as xr

# Rechunk to safe sizes before running slope
agg = xr.DataArray(
da.from_array(big_array, chunks=(512, 512))
)


GPU memory is not managed automatically
========================================

.. caution::

CuPy-backed operations allocate the full output array on the GPU.
There is no automatic tiling or fallback to CPU if the array exceeds
VRAM.

If you hit ``cuda.cudadrv.driver.CudaAPIError`` or ``cupy.cuda.memory
.OutOfMemoryError``, the array is too large for your GPU. Use Dask + CuPy
to process in chunks, or fall back to NumPy.

.. tip::

Complex geodesic kernels (slope, aspect with ``method='geodesic'``)
use many float64 registers, which limits thread-block occupancy.
xarray-spatial already reduces thread blocks to 16x16 for these
kernels, but very old GPUs (compute capability < 6.0) may still
hit register spill.


Dimension ordering
==================

.. note::

xarray-spatial assumes the last two dimensions of a DataArray are
``(y, x)`` in that order. It does not inspect ``attrs['crs']`` or
CF conventions to verify this.

If your data has ``(x, y)`` ordering (rare, but it happens with some
netCDF conventions), transpose before passing to any function:

.. code-block:: python

agg = agg.transpose('y', 'x')


Classification and NaN/Inf
===========================

.. warning::

``equal_interval()``, ``quantile()``, ``natural_breaks()``, and other
classification functions set NaN and infinite input values to NaN in the
output. They do not raise an error.

This means a raster with a few stray ``inf`` values (e.g. from a division
by zero upstream) will silently produce NaN bins for those cells.

.. tip::

Clean infinities before classifying:

.. code-block:: python

import numpy as np
agg = agg.where(np.isfinite(agg))
Loading
Loading