Support external Zarr stores as data sources (no local ingest)

## Overview

Not all climate data needs to be downloaded and ingested into our own storage. Many high-quality, analysis-ready Zarr datasets are already publicly available on cloud object storage and can be read directly. We should support connecting to these **external Zarr stores** as first-class data sources in the Climate API.

A concrete example: [dynamical.org](https://dynamical.org/) hosts open, analysis-ready weather and climate forecasts as Zarr stores on S3 — directly queryable with xarray/zarr without any local copy.

Related: #40

## Proposed behaviour

- The API can query an external Zarr store at its remote URL in the same way it queries locally ingested data
- No data is downloaded or stored — reads happen directly against the remote store at query time
- Two tiers of external sources:
  - **Pre-configured** — a curated list of well-known public datasets bundled with the API (e.g. dynamical.org ECMWF IFS ENS)
  - **User-defined** — users can register their own external Zarr URL, with optional credentials (e.g. private S3 bucket, authenticated endpoint)

## Why this is valuable

- Avoids duplicating large global datasets that are already maintained upstream
- Forecast data (NWP) is updated continuously — consuming it directly removes the need for a download/ingest pipeline
- Enables access to datasets we would never host ourselves (resolution, size, licensing)
- Complements our ingested datasets: use external sources for global/forecast context, local ingest for bias-corrected or region-specific data

## Implementation sketch

- Abstract the data access layer so both local Zarr stores and remote Zarr URLs implement the same interface
- For pre-configured sources, ship a registry (YAML/JSON) mapping a dataset ID to its Zarr store URL + variable/dimension metadata
- For user-defined sources, add a registration endpoint (or config block) accepting a Zarr URL, optional storage options (S3 credentials, region), and a human-readable label
- Validate on registration: open the Zarr store, check expected variables/dimensions are present, surface any access errors early
- Consider caching consolidated metadata (`.zmetadata`) locally to avoid repeated round-trips on every request

## Open questions

- [ ] How do we handle latency / availability — should we cache tiles or analysis results for external sources?
- [ ] Do we need a STAC-based discovery step to find the right variable in a remote store, or is direct URL + variable name enough?
- [ ] Should pre-configured sources be versioned/pinned (store URL may change upstream)?
- [ ] Credentials management for private external stores

## Example pre-configured sources to consider

| Source | URL pattern | Notes |
|---|---|---|
| [dynamical.org](https://dynamical.org/) | `s3://dynamical-…` | ECMWF IFS ENS — open, frequently updated |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support external Zarr stores as data sources (no local ingest) #46

Overview

Proposed behaviour

Why this is valuable

Implementation sketch

Open questions

Example pre-configured sources to consider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support external Zarr stores as data sources (no local ingest) #46

Description

Overview

Proposed behaviour

Why this is valuable

Implementation sketch

Open questions

Example pre-configured sources to consider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions