Description
In #771 (review of #745), @henrydingliu observed that sample-dataset column names are duplicated across three places:
- the
load_sample branch in chainladder/utils/utility_functions.py
- the test that exercises the load (e.g.
test_load_sample_clrd2025)
- the dataset table in
docs/library/sample_data.md
He suggested storing the metadata of available sample datasets and fields in one place, and using it to:
- run
load_sample()
- key off tests
- generate
MANIFEST.in
- generate
sample_data.md
This applies to every sample dataset in chainladder/utils/data/, not just clrd2025 — the same triple-duplication exists today for clrd, berqsherm, xyz, the friedland family, etc. The fix is general:
- Define one manifest file (YAML or a Python dict in
chainladder/utils/data/_manifest.py) keyed by sample name with origin, development, index, columns, cumulative, and any per-sample flags.
- Refactor
load_sample to look up its config from the manifest rather than the long if key.lower() == ... chain.
- Generate the
sample_data.md table at docs-build time (or commit the generated file with a regen script under scripts/).
- Have
MANIFEST.in include chainladder/utils/data/*.csv via the manifest's listed files, or just keep the existing wildcard — whichever the maintainers prefer.
- Update tests to iterate over the manifest, so adding a new sample is a one-line change.
Is your feature request aligned with the scope of the package?
Describe the solution you'd like, or your current workaround.
See above. Current workaround is the existing pattern — every new sample dataset (including #745's clrd2025) updates three files by hand.
Do you have any additional supporting notes?
Filed at @henrydingliu's suggestion in #771 (comment). Keeping #771 / #745 scoped to landing the data; this refactor is its own piece of work.
Description
In #771 (review of #745), @henrydingliu observed that sample-dataset column names are duplicated across three places:
load_samplebranch inchainladder/utils/utility_functions.pytest_load_sample_clrd2025)docs/library/sample_data.mdHe suggested storing the metadata of available sample datasets and fields in one place, and using it to:
load_sample()MANIFEST.insample_data.mdThis applies to every sample dataset in
chainladder/utils/data/, not justclrd2025— the same triple-duplication exists today forclrd,berqsherm,xyz, the friedland family, etc. The fix is general:chainladder/utils/data/_manifest.py) keyed by sample name withorigin,development,index,columns,cumulative, and any per-sample flags.load_sampleto look up its config from the manifest rather than the longif key.lower() == ...chain.sample_data.mdtable at docs-build time (or commit the generated file with a regen script underscripts/).MANIFEST.inincludechainladder/utils/data/*.csvvia the manifest's listed files, or just keep the existing wildcard — whichever the maintainers prefer.Is your feature request aligned with the scope of the package?
Describe the solution you'd like, or your current workaround.
See above. Current workaround is the existing pattern — every new sample dataset (including #745's
clrd2025) updates three files by hand.Do you have any additional supporting notes?
Filed at @henrydingliu's suggestion in #771 (comment). Keeping #771 / #745 scoped to landing the data; this refactor is its own piece of work.