[FEATURE] Added provider, resource, and version as required fields.#532
Closed
Eric Godwin (ericgodwin) wants to merge 2 commits into
Closed
[FEATURE] Added provider, resource, and version as required fields.#532Eric Godwin (ericgodwin) wants to merge 2 commits into
Eric Godwin (ericgodwin) wants to merge 2 commits into
Conversation
Signed-off-by: ericgodwin <eric@overturemaps.org>
🗺️ Schema reference docs preview is live!
Note ♻️ This preview updates automatically with each push to this PR. |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds source provenance identifiers to the shared source item model/schema so sources can identify provider, resource, and snapshot version alongside the existing dataset identifier. All changed files were reviewed.
Changes:
- Adds
provider,resource, andversionto source item schema/Pydantic models and generated baseline schemas. - Updates examples, counterexamples, package TOML examples, and reference fixtures to include the new fields.
- Updates source-related README and GeoParquet schema documentation.
Reviewed changes
Copilot reviewed 106 out of 106 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
schema/defs.yaml |
Adds source provenance fields to shared JSON schema definitions. |
packages/overture-schema-common/src/overture/schema/common/sources.py |
Adds required provenance fields to SourceItem. |
packages/overture-schema-common/tests/test_models.py |
Updates expected common model JSON schema. |
packages/overture-schema-common/README.md |
Updates SourceItem usage examples. |
docs/schema/0-Schema.mdx |
Updates documented GeoParquet sources struct. |
packages/overture-schema-addresses-theme/tests/address_baseline_schema.json |
Updates address baseline schema snapshot. |
packages/overture-schema-addresses-theme/pyproject.toml |
Adds source provenance to address package example. |
packages/overture-schema-base-theme/tests/bathymetry_baseline_schema.json |
Updates bathymetry baseline schema snapshot. |
packages/overture-schema-base-theme/tests/infrastructure_baseline_schema.json |
Updates infrastructure baseline schema snapshot. |
packages/overture-schema-base-theme/tests/land_baseline_schema.json |
Updates land baseline schema snapshot. |
packages/overture-schema-base-theme/tests/land_cover_baseline_schema.json |
Updates land cover baseline schema snapshot. |
packages/overture-schema-base-theme/tests/land_use_baseline_schema.json |
Updates land use baseline schema snapshot. |
packages/overture-schema-base-theme/tests/water_baseline_schema.json |
Updates water baseline schema snapshot. |
packages/overture-schema-base-theme/pyproject.toml |
Adds source provenance to base package examples. |
packages/overture-schema-buildings-theme/tests/building_baseline_schema.json |
Updates building baseline schema snapshot. |
packages/overture-schema-buildings-theme/tests/building_part_baseline_schema.json |
Updates building part baseline schema snapshot. |
packages/overture-schema-buildings-theme/pyproject.toml |
Adds source provenance to building package examples. |
packages/overture-schema-divisions-theme/tests/division_area_baseline_schema.json |
Updates division area baseline schema snapshot. |
packages/overture-schema-divisions-theme/tests/division_baseline_schema.json |
Updates division baseline schema snapshot. |
packages/overture-schema-divisions-theme/tests/division_boundary_baseline_schema.json |
Updates division boundary baseline schema snapshot. |
packages/overture-schema-divisions-theme/pyproject.toml |
Adds source provenance to division package examples. |
packages/overture-schema-places-theme/tests/place_baseline_schema.json |
Updates place baseline schema snapshot. |
packages/overture-schema-places-theme/pyproject.toml |
Adds source provenance to place package example. |
packages/overture-schema-transportation-theme/tests/connector_baseline_schema.json |
Updates connector baseline schema snapshot. |
packages/overture-schema-transportation-theme/tests/segment_baseline_schema.json |
Updates segment baseline schema snapshot. |
packages/overture-schema-transportation-theme/pyproject.toml |
Adds source provenance to transportation package examples. |
examples/base/bathymetry-example.yaml |
Adds provenance fields to base example source. |
examples/base/infrastructure-example.yaml |
Adds provenance fields to base example source. |
examples/base/infrastructure-height-example.yaml |
Adds provenance fields to base example source. |
examples/base/land-cover-example.yaml |
Adds provenance fields to base example source. |
examples/base/land-sand-example.yaml |
Adds provenance fields to base example source. |
examples/base/land-use-example.yaml |
Adds provenance fields to base example source. |
examples/base/water-body-disputed.yaml |
Adds provenance fields to base example source. |
examples/base/water-river-example.yaml |
Adds provenance fields to base example source. |
examples/base/water-river-with-wikidata.yaml |
Adds provenance fields to base example source. |
examples/buildings/basic-sources.yaml |
Adds provenance fields to building example source. |
examples/buildings/building-part-basic.yaml |
Adds provenance fields to building part example sources. |
examples/buildings/building-part-name.yaml |
Adds provenance fields to building part example source. |
examples/buildings/building-polygon.yaml |
Adds provenance fields to building example sources. |
examples/buildings/empire-state-building.json |
Adds provenance fields to JSON building example source. |
examples/buildings/license-basic.yaml |
Adds provenance fields to licensed building example source. |
examples/buildings/osm/outline.yaml |
Adds provenance fields to OSM building example source. |
examples/buildings/osm/part1.yaml |
Adds provenance fields to OSM building part example source. |
examples/buildings/osm/part2.yaml |
Adds provenance fields to OSM building part example source. |
examples/divisions/division/capital_of.yaml |
Adds provenance fields to division example source. |
examples/divisions/division/class.yaml |
Adds provenance fields to division example source. |
examples/divisions/division/dependency.yaml |
Adds provenance fields to division example source. |
examples/divisions/division/multiple_capital_division.yaml |
Adds provenance fields to division example source. |
examples/divisions/division/population.yaml |
Adds provenance fields to division example source. |
examples/divisions/division/prominence.yaml |
Adds provenance fields to division example source. |
examples/divisions/division/region.yaml |
Adds provenance fields to division example source. |
examples/places/place-no-emails-phones-socials-websites.yaml |
Adds provenance fields to place example sources. |
examples/places/place-with-operating-status.yaml |
Adds provenance fields to place example sources. |
examples/places/place.yaml |
Adds provenance fields to place example sources. |
examples/transportation/segment/road/road-with-lr-sources.yaml |
Adds provenance fields to road source examples. |
counterexamples/base/bathymetry/bad-depth.yaml |
Adds valid provenance fields to bathymetry counterexample source. |
counterexamples/buildings/bad-confidence-in-source.yaml |
Adds valid provenance fields to building counterexample sources. |
counterexamples/buildings/bad-license.yaml |
Adds valid provenance fields to building counterexample source. |
counterexamples/buildings/bad-time.yaml |
Adds valid provenance fields to building counterexample source. |
counterexamples/buildings/building-part-bad-name.yaml |
Adds valid provenance fields to building part counterexample source. |
counterexamples/places/bad-empty-emails.yaml |
Adds valid provenance fields to place counterexample sources. |
counterexamples/places/bad-empty-phones.yaml |
Adds valid provenance fields to place counterexample sources. |
counterexamples/places/bad-empty-socials.yaml |
Adds valid provenance fields to place counterexample sources. |
counterexamples/places/bad-empty-websites.yaml |
Adds valid provenance fields to place counterexample sources. |
counterexamples/places/bad-operating-status.yaml |
Adds valid provenance fields to place counterexample sources. |
counterexamples/transportation/segment/bad-source-between.yaml |
Adds valid provenance fields to transportation counterexample source. |
reference/examples/base/bathymetry-example.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/infrastructure-example.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/infrastructure-height-example.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/land-cover-example.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/land-sand-example.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/land-use-example.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/water-body-disputed.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/water-river-example.yaml |
Adds provenance fields to reference base example source. |
reference/examples/base/water-river-with-wikidata.yaml |
Adds provenance fields to reference base example source. |
reference/examples/buildings/basic-sources.yaml |
Adds provenance fields to reference building example source. |
reference/examples/buildings/building-part-basic.yaml |
Adds provenance fields to reference building part example sources. |
reference/examples/buildings/building-part-name.yaml |
Adds provenance fields to reference building part example source. |
reference/examples/buildings/building-polygon.yaml |
Adds provenance fields to reference building example sources. |
reference/examples/buildings/empire-state-building.json |
Adds provenance fields to reference JSON building example source. |
reference/examples/buildings/license-basic.yaml |
Adds provenance fields to reference licensed building example source. |
reference/examples/buildings/osm/outline.yaml |
Adds provenance fields to reference OSM building example source. |
reference/examples/buildings/osm/part1.yaml |
Adds provenance fields to reference OSM building part example source. |
reference/examples/buildings/osm/part2.yaml |
Adds provenance fields to reference OSM building part example source. |
reference/examples/divisions/division/capital_of.yaml |
Adds provenance fields to reference division example source. |
reference/examples/divisions/division/class.yaml |
Adds provenance fields to reference division example source. |
reference/examples/divisions/division/dependency.yaml |
Adds provenance fields to reference division example source. |
reference/examples/divisions/division/multiple_capital_division.yaml |
Adds provenance fields to reference division example source. |
reference/examples/divisions/division/population.yaml |
Adds provenance fields to reference division example source. |
reference/examples/divisions/division/prominence.yaml |
Adds provenance fields to reference division example source. |
reference/examples/divisions/division/region.yaml |
Adds provenance fields to reference division example source. |
reference/examples/places/place-no-emails-phones-socials-websites.yaml |
Adds provenance fields to reference place example sources. |
reference/examples/places/place-with-operating-status.yaml |
Adds provenance fields to reference place example sources. |
reference/examples/places/place.yaml |
Adds provenance fields to reference place example sources. |
reference/examples/transportation/segment/road/road-with-lr-sources.yaml |
Adds provenance fields to reference road source examples. |
reference/counterexamples/base/bathymetry/bad-depth.yaml |
Adds valid provenance fields to reference bathymetry counterexample source. |
reference/counterexamples/buildings/bad-confidence-in-source.yaml |
Adds valid provenance fields to reference building counterexample sources. |
reference/counterexamples/buildings/bad-license.yaml |
Adds valid provenance fields to reference building counterexample source. |
reference/counterexamples/buildings/bad-time.yaml |
Adds valid provenance fields to reference building counterexample source. |
reference/counterexamples/buildings/building-part-bad-name.yaml |
Adds valid provenance fields to reference building part counterexample source. |
reference/counterexamples/places/bad-empty-emails.yaml |
Adds valid provenance fields to reference place counterexample sources. |
reference/counterexamples/places/bad-empty-phones.yaml |
Adds valid provenance fields to reference place counterexample sources. |
reference/counterexamples/places/bad-empty-socials.yaml |
Adds valid provenance fields to reference place counterexample sources. |
reference/counterexamples/places/bad-empty-websites.yaml |
Adds valid provenance fields to reference place counterexample sources. |
reference/counterexamples/places/bad-operating-status.yaml |
Adds valid provenance fields to reference place counterexample sources. |
reference/counterexamples/transportation/segment/bad-source-between.yaml |
Adds valid provenance fields to reference transportation counterexample source. |
Comments suppressed due to low confidence (2)
schema/defs.yaml:270
- The new required behavior lacks dedicated counterexamples for missing
provider,resource, andversion. This repo already validates counterexamples for required-field failures, so adding negative cases would protect the intended source provenance contract from regressions.
schema/defs.yaml:284 - The new
minLengthconstraint is not covered by an empty-value counterexample forprovider(nor by analogous cases for the other new source identifiers). Without a negative example, these constraints can be accidentally removed while the example/counterexample validation suite still passes.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| version values that together identify a specific source snapshot. | ||
| type: object | ||
| required: [property, dataset] | ||
| required: [property, dataset, provider, resource, version] |
Comment on lines
+47
to
+73
| provider: Annotated[ | ||
| str, | ||
| Field( | ||
| min_length=1, | ||
| description=textwrap.dedent(""" | ||
| Name of the entity that produced the source data. | ||
| """).strip(), | ||
| ), | ||
| ] | ||
| resource: Annotated[ | ||
| str, | ||
| Field( | ||
| min_length=1, | ||
| description=textwrap.dedent(""" | ||
| Subject or data type produced by the provider. | ||
| """).strip(), | ||
| ), | ||
| ] | ||
| version: Annotated[ | ||
| str, | ||
| Field( | ||
| min_length=1, | ||
| description=textwrap.dedent(""" | ||
| Sortable source snapshot identifier, such as a date, number, or release label. | ||
| """).strip(), | ||
| ), | ||
| ] |
| | **type** | *string* | one of 14 Overture feature types | | ||
| | **version** | *int32* | version number of the feature, incremented in each Overture release where the geometry or attributes of this feature changed | | ||
| | **sources** | *list\<element: struct\<property: string, dataset: string, record_id: string, update_time: string, confidence: double, between: list\<double\>\>\>* | array of source information for the properties of a given feature | | ||
| | **sources** | *list\<element: struct\<property: string, dataset: string, provider: string, resource: string, version: string, record_id: string, update_time: string, confidence: double, between: list\<double\>\>\>* | array of source information for the properties of a given feature | |
Signed-off-by: ericgodwin <eric@overturemaps.org>
Author
|
I have changed enough functionality at this point that it makes sense to just start over. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Major change release plan
This step alone is not a major change as it is simply adding new fields, but when we remove the deprecated dataset field that will be breaking.
A. Expected release date for this MAJOR change
The breaking change for this should come 6 to 12 months after the minor change is released. For this change to be rolled out we are going to have to update all data loaders to populate these required fields. That will take some time. While this pull request is out now, we may not actually be able to implement this change until the August timeframe.
B. Related MINOR change steps
datasetfield.datasetfield and make the three new fields non-optional.C. Public documentation and messaging plan
Messaging around this change is that the current method of providing provenance is not sufficient to ensure traceability. Besides documenting the deprecation of
datasetwe will want to provide details on how theprovider,resource, andversionwork together to identify a data snapshot.Description
The intent of this change is to update our source item field to include the information necessary for data provenance:
Together, along with the
version_idthese values allow a user to uniquely identify what raw input data was used to construct Overture data. Our current system, of providing only adatasetis lacking dataset version information but is also inconsistently constructed. All three new fields will be omitable which means optional (absent or non-null string) and null rejected at validation time.Reference
Closes #530
Testing
The unit tests for the schema were run.
Test Results
Added 2 new tests:
Checklist
Checklist of tasks commonly-associated with schema pull requests. Please review the relevant checklists and ensure you do all the tasks that are required for the change you made.
Abut is not intended to test propertyA's validity, and you made a schema change that invalidates propertyAin that counterexample, fix the counterexample to align it with your schema change.Documentation website
Docs preview for this PR.