Centralize sample-dataset metadata (load_sample / tests / docs / MANIFEST.in)

### Description

In #771 (review of #745), @henrydingliu observed that sample-dataset column names are duplicated across three places:

- the `load_sample` branch in `chainladder/utils/utility_functions.py`
- the test that exercises the load (e.g. `test_load_sample_clrd2025`)
- the dataset table in `docs/library/sample_data.md`

He suggested storing the metadata of available sample datasets and fields in one place, and using it to:

- run `load_sample()`
- key off tests
- generate `MANIFEST.in`
- generate `sample_data.md`

This applies to every sample dataset in `chainladder/utils/data/`, not just `clrd2025` — the same triple-duplication exists today for `clrd`, `berqsherm`, `xyz`, the friedland family, etc. The fix is general:

1. Define one manifest file (YAML or a Python dict in `chainladder/utils/data/_manifest.py`) keyed by sample name with `origin`, `development`, `index`, `columns`, `cumulative`, and any per-sample flags.
2. Refactor `load_sample` to look up its config from the manifest rather than the long `if key.lower() == ...` chain.
3. Generate the `sample_data.md` table at docs-build time (or commit the generated file with a regen script under `scripts/`).
4. Have `MANIFEST.in` include `chainladder/utils/data/*.csv` via the manifest's listed files, or just keep the existing wildcard — whichever the maintainers prefer.
5. Update tests to iterate over the manifest, so adding a new sample is a one-line change.

### Is your feature request aligned with the scope of the package?

- [x] Yes, absolutely!

### Describe the solution you'd like, or your current workaround.

See above. Current workaround is the existing pattern — every new sample dataset (including #745's `clrd2025`) updates three files by hand.

### Do you have any additional supporting notes?

Filed at @henrydingliu's suggestion in https://github.com/casact/chainladder-python/pull/771#issuecomment-thread. Keeping #771 / #745 scoped to landing the data; this refactor is its own piece of work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralize sample-dataset metadata (load_sample / tests / docs / MANIFEST.in) #774

Description

Is your feature request aligned with the scope of the package?

Describe the solution you'd like, or your current workaround.

Do you have any additional supporting notes?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Centralize sample-dataset metadata (load_sample / tests / docs / MANIFEST.in) #774

Description

Description

Is your feature request aligned with the scope of the package?

Describe the solution you'd like, or your current workaround.

Do you have any additional supporting notes?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions