From 646f45e00729ddf473115a97567c7e1c20fd95c1 Mon Sep 17 00:00:00 2001
From: Brendan Collins <brendancol@gmail.com>
Date: Wed, 1 Apr 2026 07:36:06 -0700
Subject: [PATCH 1/2] Add caveat and assumption admonitions to docs (#1133)

New user guide page (caveats.rst) collecting the cross-cutting
assumptions: WGS84 hardcoded in geodesic path, float32 output,
NaN-as-nodata, pixel-unit proximity, dask chunk requirements,
GPU memory, and dimension ordering.

Admonition blocks added to existing reference pages: surface,
proximity, focal, classification, hydrology, terrain_metrics,
and the data_types guide.
---
 docs/source/reference/classification.rst  |   6 +
 docs/source/reference/focal.rst           |  12 +
 docs/source/reference/hydrology.rst       |   7 +
 docs/source/reference/proximity.rst       |  14 ++
 docs/source/reference/surface.rst         |  14 ++
 docs/source/reference/terrain_metrics.rst |   6 +
 docs/source/user_guide/caveats.rst        | 253 ++++++++++++++++++++++
 docs/source/user_guide/data_types.rst     |   8 +-
 docs/source/user_guide/index.rst          |   1 +
 9 files changed, 319 insertions(+), 2 deletions(-)
 create mode 100644 docs/source/user_guide/caveats.rst

diff --git a/docs/source/reference/classification.rst b/docs/source/reference/classification.rst
index 85c24743..5263ebb6 100644
--- a/docs/source/reference/classification.rst
+++ b/docs/source/reference/classification.rst
@@ -4,6 +4,12 @@
 Classification
 **************
 
+.. warning::
+
+   Classification functions silently set NaN and infinite input values
+   to NaN in the output.  Clean infinities before classifying if you
+   want every cell assigned to a bin.
+
 Equal Interval
 ==============
 .. autosummary::
diff --git a/docs/source/reference/focal.rst b/docs/source/reference/focal.rst
index c6ab56bf..d57b75e4 100644
--- a/docs/source/reference/focal.rst
+++ b/docs/source/reference/focal.rst
@@ -4,6 +4,18 @@
 Focal
 *****
 
+.. caution::
+
+   Focal operations use ``dask.array.map_overlap``.  Each chunk dimension
+   must be **larger than the kernel depth** (typically 1 cell for a 3x3
+   kernel).  Chunks smaller than the depth produce wrong results.
+
+.. note::
+
+   Focal functions output **float32**.  NaN handling varies: ``mean``
+   excludes NaN neighbours from the average, while ``hotspots`` propagates
+   NaN from any neighbour.
+
 Apply
 =====
 .. autosummary::
diff --git a/docs/source/reference/hydrology.rst b/docs/source/reference/hydrology.rst
index b7b0769e..3d6a512d 100644
--- a/docs/source/reference/hydrology.rst
+++ b/docs/source/reference/hydrology.rst
@@ -4,6 +4,13 @@
 Hydrology
 *********
 
+.. warning::
+
+   NaN cells act as **impassable barriers** in all hydrology functions.
+   Flow cannot cross them.  If your DEM has NaN holes (e.g. water bodies
+   masked out), fill or interpolate them first, or expect disconnected
+   drainage networks.
+
 Flow Direction (D8)
 ===================
 .. autosummary::
diff --git a/docs/source/reference/proximity.rst b/docs/source/reference/proximity.rst
index 8c68a29c..418238ab 100644
--- a/docs/source/reference/proximity.rst
+++ b/docs/source/reference/proximity.rst
@@ -4,6 +4,20 @@
 Proximity
 *********
 
+.. warning::
+
+   ``proximity()`` returns distances in **pixel units** by default
+   (``distance_metric='EUCLIDEAN'``).  Multiply by cell size to get
+   real-world units, or use ``'GREAT_CIRCLE'`` for lat/lon data
+   (returns kilometres).
+
+.. caution::
+
+   With Dask, ``proximity()`` expands each chunk by ``max_distance`` cells.
+   If ``max_distance`` is infinite (the default), the whole array is loaded
+   into a single chunk.  Set a finite ``max_distance`` to keep memory
+   bounded.
+
 Allocation
 ==========
 .. autosummary::
diff --git a/docs/source/reference/surface.rst b/docs/source/reference/surface.rst
index 24fdbbfb..04de4874 100644
--- a/docs/source/reference/surface.rst
+++ b/docs/source/reference/surface.rst
@@ -4,6 +4,20 @@
 Surface
 *******
 
+.. danger::
+
+   ``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
+   ``method='geodesic'`` assume the **WGS84 ellipsoid** and require
+   coordinates in **degrees** (geographic CRS).  Passing projected
+   coordinates (metres) to the geodesic method produces wrong results.
+   Use ``method='planar'`` (the default) for projected data.
+
+.. note::
+
+   All surface functions output **float32** regardless of input dtype.
+   Edge cells within the 3x3 kernel radius are NaN by default
+   (``boundary='nan'``).
+
 Aspect
 ======
 .. autosummary::
diff --git a/docs/source/reference/terrain_metrics.rst b/docs/source/reference/terrain_metrics.rst
index 10a5a9bf..8d807d51 100644
--- a/docs/source/reference/terrain_metrics.rst
+++ b/docs/source/reference/terrain_metrics.rst
@@ -4,6 +4,12 @@
 Terrain Metrics
 ***************
 
+.. note::
+
+   Terrain metrics use a 3x3 kernel and output **float32**.  Edge cells
+   are NaN by default.  These functions use the planar (Horn) algorithm
+   and assume the input is on a regular grid with uniform cell spacing.
+
 Roughness
 =========
 .. autosummary::
diff --git a/docs/source/user_guide/caveats.rst b/docs/source/user_guide/caveats.rst
new file mode 100644
index 00000000..f361c805
--- /dev/null
+++ b/docs/source/user_guide/caveats.rst
@@ -0,0 +1,253 @@
+..  _user_guide.caveats:
+
+***********************
+Caveats & Assumptions
+***********************
+
+xarray-spatial makes several assumptions about input data that can produce
+wrong or confusing results if they aren't met.  This page collects them in
+one place.  Each section maps to a Sphinx admonition level so you can gauge
+severity at a glance:
+
+.. danger:: marks assumptions that cause **silently wrong output**.
+
+.. warning:: marks gotchas that **change the meaning** of results.
+
+.. caution:: marks **performance or memory** pitfalls.
+
+.. tip:: gives practical workarounds.
+
+.. note:: provides context that clarifies non-obvious behaviour.
+
+
+.. contents:: On this page
+   :local:
+   :depth: 1
+
+
+Geodesic terrain functions assume WGS84
+========================================
+
+.. danger::
+
+   ``slope()``, ``aspect()``, ``curvature()``, and ``hillshade()`` with
+   ``method='geodesic'`` use the WGS84 ellipsoid (a = 6 378 137 m,
+   b = 6 356 752 m).  There is no parameter to select a different
+   ellipsoid.
+
+If your elevation data references a different ellipsoid (e.g. Clarke 1866 or
+Bessel 1841), the geodesic path will still run, but the ECEF projection and
+curvature correction will be slightly off.  For most modern data this
+doesn't matter because WGS84 and GRS80 are functionally identical, but
+older survey-quality datasets on local datums can differ by tens of metres
+in the geoid.
+
+
+Projected coordinates + geodesic = wrong results
+=================================================
+
+.. danger::
+
+   Passing a raster whose coordinates are in metres (UTM, state-plane, etc.)
+   to ``method='geodesic'`` will produce garbage.  The geodesic path expects
+   latitude/longitude in degrees.
+
+The library does validate that coordinate values fall within geographic
+ranges (latitude in [-90, 90], longitude in [-180, 360]) and will raise a
+``ValueError`` if they don't.  But this check is not foolproof: coordinates
+in small projected grids that happen to look like valid lat/lon values will
+slip through.
+
+.. tip::
+
+   Use ``method='planar'`` (the default) for projected CRS data.  Use
+   ``method='geodesic'`` only when your coordinates are in degrees on a
+   geographic CRS.
+
+
+All output is float32
+=====================
+
+.. warning::
+
+   Every analytical function casts its output to ``float32`` regardless of
+   input dtype.  If you pass ``float64`` elevation data, the result is
+   ``float32``.
+
+Float32 gives about 7 significant digits, which is more than enough for
+slope, NDVI, and similar metrics.  But if you're chaining many operations
+or working with data that needs sub-centimetre precision (e.g. LiDAR
+heights in a small range), the truncation can matter.
+
+See :ref:`user_guide.data_types` for the full rationale.
+
+.. tip::
+
+   If you need float64 output, cast after the function returns:
+
+   .. code-block:: python
+
+      result = slope(agg).astype('float64')
+
+
+NaN is the nodata sentinel
+==========================
+
+.. warning::
+
+   xarray-spatial uses ``NaN`` as its nodata value everywhere.  There is
+   no parameter to choose a different sentinel (e.g. -9999).
+
+Functions generally **propagate** NaN: if any cell in a 3x3 kernel is NaN,
+the output cell is NaN.  The exact behaviour varies by function:
+
+- **Slope, aspect, hillshade** -- any NaN neighbour produces a NaN output
+  cell.
+- **Focal mean/std** -- NaN neighbours are excluded from the average; the
+  cell itself is still computed.
+- **Zonal stats** -- NaN values are excluded from the aggregation.  An
+  all-NaN zone returns NaN, not zero.
+- **Hydrology (flow direction)** -- NaN cells are treated as impassable
+  barriers.  Flow cannot cross them.
+- **Pathfinding (A*)** -- NaN cells are impassable, same as barriers.
+
+.. note::
+
+   Edge cells (within the kernel radius of the raster boundary) default to
+   NaN when ``boundary='nan'``.  Use ``boundary='nearest'`` or
+   ``boundary='reflect'`` if you need values at the edges.
+
+
+Proximity defaults to pixel units
+==================================
+
+.. warning::
+
+   ``proximity()`` returns distances in **pixel units** when
+   ``distance_metric='EUCLIDEAN'`` (the default).  It does not automatically
+   convert to metres or kilometres.
+
+To get distances in real-world units with Euclidean mode, multiply the
+result by your cell size.  Or switch to ``distance_metric='GREAT_CIRCLE'``
+if your data is in degrees, which returns kilometres.
+
+.. tip::
+
+   .. code-block:: python
+
+      from xrspatial.proximity import proximity
+
+      # Pixel-unit result
+      dist_px = proximity(raster)
+
+      # Convert to metres (assumes square cells)
+      cell_m = 30.0  # e.g. Landsat 30 m resolution
+      dist_m = dist_px * cell_m
+
+
+Haversine uses the semi-major axis
+====================================
+
+.. note::
+
+   Great-circle distances in ``proximity()`` and ``surface_distance()``
+   use the WGS84 **semi-major axis** (6 378 137 m) as the Earth radius,
+   not the mean radius (6 371 km).
+
+The difference is about 0.1 %.  This is negligible for most use cases but
+worth knowing if you're comparing against a reference that uses the mean
+radius or the IUGG value.
+
+
+Dask chunk sizes and ``map_overlap``
+=====================================
+
+.. caution::
+
+   Focal, slope, aspect, and other kernel-based operations use
+   ``dask.array.map_overlap``.  If any chunk dimension is **smaller than
+   the kernel depth** (typically 1 cell for a 3x3 kernel), the overlap
+   will fail or produce wrong results.
+
+.. caution::
+
+   ``proximity()`` with Dask expands each chunk by ``max_distance`` cells.
+   If ``max_distance`` is infinite (the default), the entire array is loaded
+   into one chunk.  Set a finite ``max_distance`` to keep memory bounded.
+
+.. tip::
+
+   A good rule of thumb for focal operations: keep chunks at least 64 cells
+   on each side.  For proximity, set ``max_distance`` to the largest
+   distance you actually care about.
+
+   .. code-block:: python
+
+      import dask.array as da
+      import xarray as xr
+
+      # Rechunk to safe sizes before running slope
+      agg = xr.DataArray(
+          da.from_array(big_array, chunks=(512, 512))
+      )
+
+
+GPU memory is not managed automatically
+========================================
+
+.. caution::
+
+   CuPy-backed operations allocate the full output array on the GPU.
+   There is no automatic tiling or fallback to CPU if the array exceeds
+   VRAM.
+
+If you hit ``cuda.cudadrv.driver.CudaAPIError`` or ``cupy.cuda.memory
+.OutOfMemoryError``, the array is too large for your GPU.  Use Dask + CuPy
+to process in chunks, or fall back to NumPy.
+
+.. tip::
+
+   Complex geodesic kernels (slope, aspect with ``method='geodesic'``)
+   use many float64 registers, which limits thread-block occupancy.
+   xarray-spatial already reduces thread blocks to 16x16 for these
+   kernels, but very old GPUs (compute capability < 6.0) may still
+   hit register spill.
+
+
+Dimension ordering
+==================
+
+.. note::
+
+   xarray-spatial assumes the last two dimensions of a DataArray are
+   ``(y, x)`` in that order.  It does not inspect ``attrs['crs']`` or
+   CF conventions to verify this.
+
+If your data has ``(x, y)`` ordering (rare, but it happens with some
+netCDF conventions), transpose before passing to any function:
+
+.. code-block:: python
+
+   agg = agg.transpose('y', 'x')
+
+
+Classification and NaN/Inf
+===========================
+
+.. warning::
+
+   ``equal_interval()``, ``quantile()``, ``natural_breaks()``, and other
+   classification functions set NaN and infinite input values to NaN in the
+   output.  They do not raise an error.
+
+This means a raster with a few stray ``inf`` values (e.g. from a division
+by zero upstream) will silently produce NaN bins for those cells.
+
+.. tip::
+
+   Clean infinities before classifying:
+
+   .. code-block:: python
+
+      import numpy as np
+      agg = agg.where(np.isfinite(agg))
diff --git a/docs/source/user_guide/data_types.rst b/docs/source/user_guide/data_types.rst
index e6775f6c..77df9340 100644
--- a/docs/source/user_guide/data_types.rst
+++ b/docs/source/user_guide/data_types.rst
@@ -11,8 +11,12 @@ Overview
 
 xarray-spatial standardizes on **float32** (32-bit floating-point) for output data in most analytical functions. This design decision provides a balance between computational precision, memory efficiency, and performance that is well-suited for raster analysis tasks.
 
-.. note::
-   All multispectral indices, terrain analysis functions, and most other operations in xarray-spatial output float32 data, regardless of input data type.
+.. warning::
+
+   All multispectral indices, terrain analysis functions, and most other
+   operations in xarray-spatial output **float32** data, regardless of input
+   data type.  If you pass float64 data, the output is float32.  Cast the
+   result with ``.astype('float64')`` if you need higher precision.
 
 
 Why Float32?
diff --git a/docs/source/user_guide/index.rst b/docs/source/user_guide/index.rst
index e887fa78..b28978c6 100644
--- a/docs/source/user_guide/index.rst
+++ b/docs/source/user_guide/index.rst
@@ -7,6 +7,7 @@ User Guide
 .. toctree::
    :maxdepth: 1
 
+   caveats
    data_types
    classification
    fire

From e71f4c4e1619981505d34d957396f5b709551999 Mon Sep 17 00:00:00 2001
From: Brendan Collins <brendancol@gmail.com>
Date: Wed, 1 Apr 2026 07:54:40 -0700
Subject: [PATCH 2/2] Fix review findings: proximity units, terrain dtype,
 missing pages (#1133)

- Proximity EUCLIDEAN returns coordinate-unit distances, not pixels
- Great-circle returns metres, not kilometres
- Terrain metrics output float64, not float32; drop Horn attribution
- Add admonitions to multispectral, zonal, and pathfinding pages
- Clarify haversine section title
---
 docs/source/reference/multispectral.rst   |  7 +++++
 docs/source/reference/pathfinding.rst     | 11 ++++++++
 docs/source/reference/proximity.rst       |  8 +++---
 docs/source/reference/terrain_metrics.rst |  6 ++---
 docs/source/reference/zonal.rst           |  5 ++++
 docs/source/user_guide/caveats.rst        | 32 +++++++++++------------
 6 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/docs/source/reference/multispectral.rst b/docs/source/reference/multispectral.rst
index 0303bc44..7a9f101c 100644
--- a/docs/source/reference/multispectral.rst
+++ b/docs/source/reference/multispectral.rst
@@ -4,6 +4,13 @@
 Multispectral
 *************
 
+.. note::
+
+   All spectral indices output **float32**.  Division by zero (e.g.
+   NDVI where NIR + Red = 0) produces NaN or inf silently.  Clean
+   the result with ``xr.where(np.isfinite(result), result, np.nan)``
+   if needed.
+
 Atmospherically Resistant Vegetation Index (ARVI)
 =================================================
 .. autosummary::
diff --git a/docs/source/reference/pathfinding.rst b/docs/source/reference/pathfinding.rst
index d0e4075e..e50c4b5b 100644
--- a/docs/source/reference/pathfinding.rst
+++ b/docs/source/reference/pathfinding.rst
@@ -4,6 +4,17 @@
 Pathfinding
 ***********
 
+.. caution::
+
+   A* allocates about 65 bytes per pixel and will raise ``MemoryError``
+   if the required memory exceeds 80 % of available RAM.  Use Dask or
+   set ``search_radius`` to limit the search area for large rasters.
+
+.. warning::
+
+   NaN and non-positive friction values are treated as impassable
+   barriers.  Cells must have finite positive friction to be traversable.
+
 A* Pathfinding
 ==============
 .. autosummary::
diff --git a/docs/source/reference/proximity.rst b/docs/source/reference/proximity.rst
index 418238ab..e9b4530c 100644
--- a/docs/source/reference/proximity.rst
+++ b/docs/source/reference/proximity.rst
@@ -6,10 +6,10 @@ Proximity
 
 .. warning::
 
-   ``proximity()`` returns distances in **pixel units** by default
-   (``distance_metric='EUCLIDEAN'``).  Multiply by cell size to get
-   real-world units, or use ``'GREAT_CIRCLE'`` for lat/lon data
-   (returns kilometres).
+   ``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
+   returns distances in the **coordinate units** of the DataArray.
+   With default integer indices this equals pixel counts.  Use
+   ``'GREAT_CIRCLE'`` for lat/lon data (returns metres).
 
 .. caution::
 
diff --git a/docs/source/reference/terrain_metrics.rst b/docs/source/reference/terrain_metrics.rst
index 8d807d51..d0ebb29c 100644
--- a/docs/source/reference/terrain_metrics.rst
+++ b/docs/source/reference/terrain_metrics.rst
@@ -6,9 +6,9 @@ Terrain Metrics
 
 .. note::
 
-   Terrain metrics use a 3x3 kernel and output **float32**.  Edge cells
-   are NaN by default.  These functions use the planar (Horn) algorithm
-   and assume the input is on a regular grid with uniform cell spacing.
+   Terrain metrics use a 3x3 neighbourhood and output **float64**.  Edge
+   cells are NaN.  These functions assume the input is on a regular grid
+   with uniform cell spacing.
 
 Roughness
 =========
diff --git a/docs/source/reference/zonal.rst b/docs/source/reference/zonal.rst
index 9be3ad17..c151f5ec 100644
--- a/docs/source/reference/zonal.rst
+++ b/docs/source/reference/zonal.rst
@@ -4,6 +4,11 @@
 Zonal
 *****
 
+.. note::
+
+   NaN values are excluded from all zonal aggregations.  A zone where
+   every cell is NaN returns NaN (not zero) for sum, mean, etc.
+
 Apply
 =====
 .. autosummary::
diff --git a/docs/source/user_guide/caveats.rst b/docs/source/user_guide/caveats.rst
index f361c805..3888a990 100644
--- a/docs/source/user_guide/caveats.rst
+++ b/docs/source/user_guide/caveats.rst
@@ -118,18 +118,19 @@ the output cell is NaN.  The exact behaviour varies by function:
    ``boundary='reflect'`` if you need values at the edges.
 
 
-Proximity defaults to pixel units
-==================================
+Proximity distances depend on coordinate units
+================================================
 
 .. warning::
 
-   ``proximity()`` returns distances in **pixel units** when
-   ``distance_metric='EUCLIDEAN'`` (the default).  It does not automatically
-   convert to metres or kilometres.
+   ``proximity()`` with ``distance_metric='EUCLIDEAN'`` (the default)
+   computes distances in whatever units the DataArray's coordinates use.
+   If coordinates are default integer indices (0, 1, 2, ...), the result
+   is in pixel counts.  If coordinates are in metres (UTM), the result is
+   in metres.  If coordinates are in degrees, the result is in degrees.
 
-To get distances in real-world units with Euclidean mode, multiply the
-result by your cell size.  Or switch to ``distance_metric='GREAT_CIRCLE'``
-if your data is in degrees, which returns kilometres.
+Switch to ``distance_metric='GREAT_CIRCLE'`` for lat/lon data to get
+distances in **metres** (the radius used is 6 378 137 m).
 
 .. tip::
 
@@ -137,22 +138,21 @@ if your data is in degrees, which returns kilometres.
 
       from xrspatial.proximity import proximity
 
-      # Pixel-unit result
+      # EUCLIDEAN with default coords = pixel-count distances
       dist_px = proximity(raster)
 
-      # Convert to metres (assumes square cells)
-      cell_m = 30.0  # e.g. Landsat 30 m resolution
-      dist_m = dist_px * cell_m
+      # GREAT_CIRCLE with lat/lon coords = distances in metres
+      dist_m = proximity(raster, distance_metric='GREAT_CIRCLE')
 
 
-Haversine uses the semi-major axis
-====================================
+Great-circle distances assume sphere radius = semi-major axis
+==============================================================
 
 .. note::
 
    Great-circle distances in ``proximity()`` and ``surface_distance()``
-   use the WGS84 **semi-major axis** (6 378 137 m) as the Earth radius,
-   not the mean radius (6 371 km).
+   model the Earth as a sphere with radius = WGS84 semi-major axis
+   (6 378 137 m), not the mean radius (~6 371 km).
 
 The difference is about 0.1 %.  This is negligible for most use cases but
 worth knowing if you're comparing against a reference that uses the mean