Gaps in referential integrity validation #103
Replies: 1 comment
-
|
Some rambling thoughts in response to these prompts - Re: Gap #3
I guess this also relates a bit to what we chatted about re: filtering for model years earlier - it’s not necessarily cross-table but something that checks that at least a tuple of unique column values exists for each model year. E.g. for Re: “A possible direction”
Re: “Another possible direction”
Ok also very keen to keep discussing all of this and will continue to collect my thoughts and reflect on this! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Posting this to stimulate discussion rather than as a finished proposal — the framing,
gap categories, and examples below are rough cuts and might not be exactly right.
Our per-column
allowed_values_fromseems to cover basic foreign-key checks well, buta quick audit across
src/ispypsa/validation/schemassuggests there are a few classesof cross-table check it can't easily express today. Tentative grouping below, with one
real example each — happy to be told these aren't really the right categories, or that
some of them aren't worth supporting.
Gaps
1. Filtered targets — "value must be in T where T.col = v".
2. Conjunctions — "must be in A and in B".
3. Composite-key (tuple) checks — "(col_a, col_b) tuple must match same tuple in
another table".
4. Intra-table multi-column row predicates —
A possible direction
One way to address gaps 1 and 2 would be to extend
allowed_values_fromwith awherefilter and explicitany_of/all_ofkeywords for boolean composition.Coverage / reverse checks would also fall out of this for free — they're just FK
declarations sitting on the referenced column. Sketches below to give a flavour, not
to nail down syntax.
Single target (today, no change):
Filtered target:
Disjunction with explicit
any_of(replaces the current bare-list form):Conjunction with nesting:
whereoperators:{column: c, equals: v},{column: c, in: [...]},{column: c, not_null: true}.For gaps 3 and 4 — tuple FKs and intra-table row predicates — leaning toward
custom_validation, for the reasons above. If we landed somewhere like this, everycheck would live next to the table it constrains and a central referential-integrity
file probably isn't needed. But happy to be talked out of any of this.
Another possible direction
The approach above keeps every constraint declared on the column it applies to. The
flip side is that some constraints get repeated a lot.
geo_id → network_geographyisdeclared in ~8 columns across 7 tables; the
fuel_typeallowed-values list is copy-pasted across 5 schemas (with some drift between copies);
technology → costs_new_entrant_buildappears in 3 places. Compound constraints would amplify thisif they ever applied to multiple columns.
A dataset-level layer could help. Two pieces, roughly:
Named reference types — define an FK once, reference it by name:
Shared enum lists — pull recurring allowed-values lists out once:
Both feel like wins independent of the syntax-extension discussion above — they'd
tidy the existing schemas and remove the drift risk regardless of whether we add
where/any_of/all_of. Open question whether to do them together orsequence them.
Beta Was this translation helpful? Give feedback.
All reactions