Overview
Not all climate data needs to be downloaded and ingested into our own storage. Many high-quality, analysis-ready Zarr datasets are already publicly available on cloud object storage and can be read directly. We should support connecting to these external Zarr stores as first-class data sources in the Climate API.
A concrete example: dynamical.org hosts open, analysis-ready weather and climate forecasts as Zarr stores on S3 — directly queryable with xarray/zarr without any local copy.
Related: #40
Proposed behaviour
- The API can query an external Zarr store at its remote URL in the same way it queries locally ingested data
- No data is downloaded or stored — reads happen directly against the remote store at query time
- Two tiers of external sources:
- Pre-configured — a curated list of well-known public datasets bundled with the API (e.g. dynamical.org ECMWF IFS ENS)
- User-defined — users can register their own external Zarr URL, with optional credentials (e.g. private S3 bucket, authenticated endpoint)
Why this is valuable
- Avoids duplicating large global datasets that are already maintained upstream
- Forecast data (NWP) is updated continuously — consuming it directly removes the need for a download/ingest pipeline
- Enables access to datasets we would never host ourselves (resolution, size, licensing)
- Complements our ingested datasets: use external sources for global/forecast context, local ingest for bias-corrected or region-specific data
Implementation sketch
- Abstract the data access layer so both local Zarr stores and remote Zarr URLs implement the same interface
- For pre-configured sources, ship a registry (YAML/JSON) mapping a dataset ID to its Zarr store URL + variable/dimension metadata
- For user-defined sources, add a registration endpoint (or config block) accepting a Zarr URL, optional storage options (S3 credentials, region), and a human-readable label
- Validate on registration: open the Zarr store, check expected variables/dimensions are present, surface any access errors early
- Consider caching consolidated metadata (
.zmetadata) locally to avoid repeated round-trips on every request
Open questions
Example pre-configured sources to consider
| Source |
URL pattern |
Notes |
| dynamical.org |
s3://dynamical-… |
ECMWF IFS ENS — open, frequently updated |
Overview
Not all climate data needs to be downloaded and ingested into our own storage. Many high-quality, analysis-ready Zarr datasets are already publicly available on cloud object storage and can be read directly. We should support connecting to these external Zarr stores as first-class data sources in the Climate API.
A concrete example: dynamical.org hosts open, analysis-ready weather and climate forecasts as Zarr stores on S3 — directly queryable with xarray/zarr without any local copy.
Related: #40
Proposed behaviour
Why this is valuable
Implementation sketch
.zmetadata) locally to avoid repeated round-trips on every requestOpen questions
Example pre-configured sources to consider
s3://dynamical-…