Conversation
Remove DHIS2 connection string references from setup section, add /extents and /datasets to endpoint table, and expand STAC example to show catalog discovery before opening a dataset with xarray.
uv run uvicorn resolves the uvicorn binary via PATH, which picks up conda's uvicorn when the base environment is active. Using python -m uvicorn forces the venv's interpreter and avoids the module not found error in the reload subprocess.
Datasets use x/y dimension names not latitude/longitude. Direct access example now reads open_kwargs from the STAC collection rather than hardcoding consolidated=False, which fails for Zarr v3 stores.
Step-by-step guide covering extent configuration, environment setup, first ingestion, and ERA5-Land DestinE authentication. Links added from README and user_guide.md.
- datasets: validate id is a non-empty string before using it as a dict key - datasets: require ingestion.eo_function for all sync_kind values including static — static datasets still need an initial ingestion, so the download path cannot safely omit the block - config: validate YAML root is a mapping and raise a clear ValueError if not, rather than letting dict() crash with a low-signal TypeError - tests: add ingestion.eo_function to static template fixtures in test_config.py and test_dataset_registry.py to match the now-enforced contract
- datasets: validate datasets_dir is a str/Path before Path concatenation
- extents: validate extent.id and extent.bbox in get_extent() and raise
clear ValueErrors pointing to CLIMATE_API_CONFIG rather than letting
callers hit KeyError/TypeError with no context
- client: use catalog.get('links') and validate it is a list before
iterating, so a non-STAC response raises a clear ValueError instead
of a KeyError
…stance - Add 30s timeout to both httpx.get() calls in client.py to prevent indefinite hangs on network issues - Set allow_credentials=False in CORSMiddleware; combining allow_origins=["*"] with allow_credentials=True is a CORS spec violation and a security footgun - Use isinstance(x, (str, Path)) instead of str | Path union syntax for broader clarity (tuple form is unambiguous across all Python versions)
…x plural in docs
- Validate href in each STAC child link before slicing the id from it
- Check that assets is a dict before calling .get("zarr") to avoid
AttributeError on malformed STAC responses
- Fix "Confirm configured extents" heading to singular in managed data guide
Previously, built-in dataset YAMLs were located by walking four directory levels up from datasets.py and appending data/datasets/. This works in a source checkout or editable install but fails silently in a wheel install: the package lands in site-packages/ and the project-root data/ directory is never included in the wheel, causing list_datasets() to crash with "Path is not a directory". Move the YAMLs into the package at src/climate_api/data/datasets/ and load them via importlib.resources.files(). importlib.resources is package-aware and resolves correctly whether the package is an unpacked directory or a zip inside a wheel. User-provided datasets_dir (from CLIMATE_API_CONFIG) continues to use regular Path objects via _load_from_dir() — that path is always on disk.
…ts, safer conftest teardown - Raise ValueError (not KeyError) when the Zarr asset is missing or not a dict — all other error paths in open_dataset raise ValueError, so callers catch one exception type - Inject id into a copy of the link dict instead of mutating the parsed JSON object in-place - Use os.environ.pop() instead of del in conftest session fixture teardown to avoid KeyError if the env var was already removed by a test's monkeypatch - Replace next() generator in setup guide with an explicit list so an empty catalog gives an IndexError with clear context rather than StopIteration
…ative path Walking __file__ four levels up to find data/downloads/ fails when the package is installed with pip because __file__ lands in site-packages/ and the project root is not accessible. The directory may also be non-writable. Default to $XDG_DATA_HOME/climate-api/downloads (~/.local/share/climate-api/downloads if XDG_DATA_HOME is unset), which is always user-writable. The existing CACHE_OVERRIDE env var continues to work and takes precedence, keeping Docker and dev deployments unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The Climate API was designed from the start for a single deployment scenario: clone the repo, edit files in place, run. This worked for early development but creates real problems as we move toward production deployments and want users to be able to install the package with
pip install climate-api:site-packages/because the project root is no longer accessible.This PR addresses all of these to make the package usable outside a source checkout.
What changed
Instance configuration via
CLIMATE_API_CONFIG(closes #61)A new
CLIMATE_API_CONFIGenvironment variable points to a YAML file that lives outside the repository. This separates instance-specific configuration from the package itself, so the package can be upgraded without overwriting local config.The extent is a single block per instance (not a list). The
GET /extentendpoint returns it, or 404 if not configured. Dataset templates fromdatasets_dirare merged with the built-ins — a custom template with the sameidoverrides the built-in one.Built-in dataset templates bundled inside the package
Previously, the built-in YAML templates (
chirps3.yaml,era5_land.yaml,worldpop.yaml) lived indata/datasets/at the project root and were located by walking four directory levels up from the source file. This breaks when the package is installed withpip install, because the package ends up insite-packages/with no path to the original project root.The YAMLs are now bundled inside the package at
src/climate_api/data/datasets/and loaded viaimportlib.resources, which resolves the correct location regardless of how the package was installed.Coordinate normalisation at write time
All Zarr datasets are now written with canonical coordinate names (
time,latitude,longitude) regardless of what the upstream source uses (valid_time,lat/lon,x/y). This is enforced inbuild_dataset_zarr()for both flat and pyramid outputs.Every downstream consumer — the client, the user guide, the OGC API — can now use
ds.latitude,ds.longitude,ds.timewithout dataset-specific branching.Python client for dataset discovery and access (closes #60)
A new
climate_api.clientmodule makes it possible to discover and open datasets without constructing URLs manually:Module-level functions (
list_datasets,open_dataset) fall back to theCLIMATE_API_BASE_URLenvironment variable, so scripts work without hardcoding a URL.create_app()factory functionThe FastAPI application is now created via a
create_app()factory, making it straightforward to embed the API in a larger application:CORS credentials flag corrected
allow_credentialswas incorrectly set toTruealongsideallow_origins=["*"]. This combination violates the CORS specification and is rejected by browsers. It is now set toFalse, which is correct for a public data API that does not use cookies or session tokens.Dataset template field renamed:
cache_info→ingestionThe
cache_infoblock in dataset template YAMLs is renamed toingestion. Theingestion.eo_functionfield is now required for all sync kinds, not just temporal ones.Documentation
docs/setup_guide.md— step-by-step instance setup from install to first ingestiondocs/user_guide.md— consumer guide: STAC discovery, opening with xarray, subsettingdocs/adding_custom_datasets.md— how to write a custom dataset template and wire it upexamples/stac_discover_and_open.pyandexamples/zarr_direct_access.py— runnable examples using the clientMigration note
Existing datasets must be deleted and re-ingested. Coordinate normalisation only applies to newly written Zarr stores. Zarr files written before this PR will retain their original source coordinate names.
Rename
cache_info:toingestion:in any custom dataset YAML templates.Test plan
make runstarts the API without errorsuv run examples/stac_discover_and_open.pylists published datasets and prints dataset infouv run examples/zarr_direct_access.pyopens a Zarr store and prints a spatial mean time seriesfrom climate_api.client import Client; print(Client("http://127.0.0.1:8000").catalog())works in a Python sessionclimate-api.yamlserves the correct extent and built-in datasetsdatasets_dirwith a custom YAML adds that dataset alongside the built-insmake testpasses