Skip to content

[FEATURE] Added provider, resource, and version as required fields.#532

Closed
Eric Godwin (ericgodwin) wants to merge 2 commits into
mainfrom
ericg/530-add-version-to-source-item
Closed

[FEATURE] Added provider, resource, and version as required fields.#532
Eric Godwin (ericgodwin) wants to merge 2 commits into
mainfrom
ericg/530-add-version-to-source-item

Conversation

@ericgodwin
Copy link
Copy Markdown

@ericgodwin Eric Godwin (ericgodwin) commented May 19, 2026

Major change release plan

This step alone is not a major change as it is simply adding new fields, but when we remove the deprecated dataset field that will be breaking.

A. Expected release date for this MAJOR change

The breaking change for this should come 6 to 12 months after the minor change is released. For this change to be rolled out we are going to have to update all data loaders to populate these required fields. That will take some time. While this pull request is out now, we may not actually be able to implement this change until the August timeframe.

B. Related MINOR change steps

  • Release this non-breaking change and announce a deprecation timeline for the dataset field.
  • Come back later and remove the outdated dataset field and make the three new fields non-optional.

C. Public documentation and messaging plan

Messaging around this change is that the current method of providing provenance is not sufficient to ensure traceability. Besides documenting the deprecation of dataset we will want to provide details on how the provider, resource, and version work together to identify a data snapshot.

Description

The intent of this change is to update our source item field to include the information necessary for data provenance:

- provider: The name of the entity that produced the data: meta, esri, microsoft, osm, etc.
- resource: The subject or type of data given by the provider: division-names, buildings, planet, etc.
- version: The sortable identifier such as a date or number: 2026-02-13, 5.3, A5692

Together, along with the version_id these values allow a user to uniquely identify what raw input data was used to construct Overture data. Our current system, of providing only a dataset is lacking dataset version information but is also inconsistently constructed. All three new fields will be omitable which means optional (absent or non-null string) and null rejected at validation time.

Reference

Closes #530

Testing

The unit tests for the schema were run.

Test Results

Added 2 new tests:

  • test_source_item_provider_resource_version_omittable (omission works, fields excluded from JSON)
  • test_source_item_provider_resource_version_not_nullable (null raises ValidationError)
 All 1995 tests passed.

 uv run pytest -W error packages/ -q --tb=short
 1995 passed in 5.18s

 Lint and formatting checks also clean (`ruff check`, `ruff format --check`).

Checklist

Checklist of tasks commonly-associated with schema pull requests. Please review the relevant checklists and ensure you do all the tasks that are required for the change you made.

  1. Add relevant examples.
  2. Add relevant counterexamples.
  3. Update any counterexamples that became obsolete. For example, if a counterexample uses property A but is not intended to test property A's validity, and you made a schema change that invalidates property A in that counterexample, fix the counterexample to align it with your schema change.
  4. Update in-schema documentation using plain English written in complete sentences, if an update is required.
  5. Update Docusaurus documentation, if an update is required.
  6. Review change with Overture technical writer to ensure any advanced documentation needs will be taken care of, unless the change is trivial and would not affect the documentation.

Documentation website

Docs preview for this PR.

Signed-off-by: ericgodwin <eric@overturemaps.org>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

🗺️ Schema reference docs preview is live!

🌍 Preview https://staging.overturemaps.org/schema/pr/532/schema/index.html
🕐 Updated May 19, 2026 03:03 UTC
📝 Commit 963fa5e
🔧 env SCHEMA_PREVIEW true

Note

♻️ This preview updates automatically with each push to this PR.

@ericgodwin Eric Godwin (ericgodwin) added the change type - minor 🤏 Minor schema change. See https://lf-overturemaps.atlassian.net/wiki/x/GgDa label May 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds source provenance identifiers to the shared source item model/schema so sources can identify provider, resource, and snapshot version alongside the existing dataset identifier. All changed files were reviewed.

Changes:

  • Adds provider, resource, and version to source item schema/Pydantic models and generated baseline schemas.
  • Updates examples, counterexamples, package TOML examples, and reference fixtures to include the new fields.
  • Updates source-related README and GeoParquet schema documentation.

Reviewed changes

Copilot reviewed 106 out of 106 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
schema/defs.yaml Adds source provenance fields to shared JSON schema definitions.
packages/overture-schema-common/src/overture/schema/common/sources.py Adds required provenance fields to SourceItem.
packages/overture-schema-common/tests/test_models.py Updates expected common model JSON schema.
packages/overture-schema-common/README.md Updates SourceItem usage examples.
docs/schema/0-Schema.mdx Updates documented GeoParquet sources struct.
packages/overture-schema-addresses-theme/tests/address_baseline_schema.json Updates address baseline schema snapshot.
packages/overture-schema-addresses-theme/pyproject.toml Adds source provenance to address package example.
packages/overture-schema-base-theme/tests/bathymetry_baseline_schema.json Updates bathymetry baseline schema snapshot.
packages/overture-schema-base-theme/tests/infrastructure_baseline_schema.json Updates infrastructure baseline schema snapshot.
packages/overture-schema-base-theme/tests/land_baseline_schema.json Updates land baseline schema snapshot.
packages/overture-schema-base-theme/tests/land_cover_baseline_schema.json Updates land cover baseline schema snapshot.
packages/overture-schema-base-theme/tests/land_use_baseline_schema.json Updates land use baseline schema snapshot.
packages/overture-schema-base-theme/tests/water_baseline_schema.json Updates water baseline schema snapshot.
packages/overture-schema-base-theme/pyproject.toml Adds source provenance to base package examples.
packages/overture-schema-buildings-theme/tests/building_baseline_schema.json Updates building baseline schema snapshot.
packages/overture-schema-buildings-theme/tests/building_part_baseline_schema.json Updates building part baseline schema snapshot.
packages/overture-schema-buildings-theme/pyproject.toml Adds source provenance to building package examples.
packages/overture-schema-divisions-theme/tests/division_area_baseline_schema.json Updates division area baseline schema snapshot.
packages/overture-schema-divisions-theme/tests/division_baseline_schema.json Updates division baseline schema snapshot.
packages/overture-schema-divisions-theme/tests/division_boundary_baseline_schema.json Updates division boundary baseline schema snapshot.
packages/overture-schema-divisions-theme/pyproject.toml Adds source provenance to division package examples.
packages/overture-schema-places-theme/tests/place_baseline_schema.json Updates place baseline schema snapshot.
packages/overture-schema-places-theme/pyproject.toml Adds source provenance to place package example.
packages/overture-schema-transportation-theme/tests/connector_baseline_schema.json Updates connector baseline schema snapshot.
packages/overture-schema-transportation-theme/tests/segment_baseline_schema.json Updates segment baseline schema snapshot.
packages/overture-schema-transportation-theme/pyproject.toml Adds source provenance to transportation package examples.
examples/base/bathymetry-example.yaml Adds provenance fields to base example source.
examples/base/infrastructure-example.yaml Adds provenance fields to base example source.
examples/base/infrastructure-height-example.yaml Adds provenance fields to base example source.
examples/base/land-cover-example.yaml Adds provenance fields to base example source.
examples/base/land-sand-example.yaml Adds provenance fields to base example source.
examples/base/land-use-example.yaml Adds provenance fields to base example source.
examples/base/water-body-disputed.yaml Adds provenance fields to base example source.
examples/base/water-river-example.yaml Adds provenance fields to base example source.
examples/base/water-river-with-wikidata.yaml Adds provenance fields to base example source.
examples/buildings/basic-sources.yaml Adds provenance fields to building example source.
examples/buildings/building-part-basic.yaml Adds provenance fields to building part example sources.
examples/buildings/building-part-name.yaml Adds provenance fields to building part example source.
examples/buildings/building-polygon.yaml Adds provenance fields to building example sources.
examples/buildings/empire-state-building.json Adds provenance fields to JSON building example source.
examples/buildings/license-basic.yaml Adds provenance fields to licensed building example source.
examples/buildings/osm/outline.yaml Adds provenance fields to OSM building example source.
examples/buildings/osm/part1.yaml Adds provenance fields to OSM building part example source.
examples/buildings/osm/part2.yaml Adds provenance fields to OSM building part example source.
examples/divisions/division/capital_of.yaml Adds provenance fields to division example source.
examples/divisions/division/class.yaml Adds provenance fields to division example source.
examples/divisions/division/dependency.yaml Adds provenance fields to division example source.
examples/divisions/division/multiple_capital_division.yaml Adds provenance fields to division example source.
examples/divisions/division/population.yaml Adds provenance fields to division example source.
examples/divisions/division/prominence.yaml Adds provenance fields to division example source.
examples/divisions/division/region.yaml Adds provenance fields to division example source.
examples/places/place-no-emails-phones-socials-websites.yaml Adds provenance fields to place example sources.
examples/places/place-with-operating-status.yaml Adds provenance fields to place example sources.
examples/places/place.yaml Adds provenance fields to place example sources.
examples/transportation/segment/road/road-with-lr-sources.yaml Adds provenance fields to road source examples.
counterexamples/base/bathymetry/bad-depth.yaml Adds valid provenance fields to bathymetry counterexample source.
counterexamples/buildings/bad-confidence-in-source.yaml Adds valid provenance fields to building counterexample sources.
counterexamples/buildings/bad-license.yaml Adds valid provenance fields to building counterexample source.
counterexamples/buildings/bad-time.yaml Adds valid provenance fields to building counterexample source.
counterexamples/buildings/building-part-bad-name.yaml Adds valid provenance fields to building part counterexample source.
counterexamples/places/bad-empty-emails.yaml Adds valid provenance fields to place counterexample sources.
counterexamples/places/bad-empty-phones.yaml Adds valid provenance fields to place counterexample sources.
counterexamples/places/bad-empty-socials.yaml Adds valid provenance fields to place counterexample sources.
counterexamples/places/bad-empty-websites.yaml Adds valid provenance fields to place counterexample sources.
counterexamples/places/bad-operating-status.yaml Adds valid provenance fields to place counterexample sources.
counterexamples/transportation/segment/bad-source-between.yaml Adds valid provenance fields to transportation counterexample source.
reference/examples/base/bathymetry-example.yaml Adds provenance fields to reference base example source.
reference/examples/base/infrastructure-example.yaml Adds provenance fields to reference base example source.
reference/examples/base/infrastructure-height-example.yaml Adds provenance fields to reference base example source.
reference/examples/base/land-cover-example.yaml Adds provenance fields to reference base example source.
reference/examples/base/land-sand-example.yaml Adds provenance fields to reference base example source.
reference/examples/base/land-use-example.yaml Adds provenance fields to reference base example source.
reference/examples/base/water-body-disputed.yaml Adds provenance fields to reference base example source.
reference/examples/base/water-river-example.yaml Adds provenance fields to reference base example source.
reference/examples/base/water-river-with-wikidata.yaml Adds provenance fields to reference base example source.
reference/examples/buildings/basic-sources.yaml Adds provenance fields to reference building example source.
reference/examples/buildings/building-part-basic.yaml Adds provenance fields to reference building part example sources.
reference/examples/buildings/building-part-name.yaml Adds provenance fields to reference building part example source.
reference/examples/buildings/building-polygon.yaml Adds provenance fields to reference building example sources.
reference/examples/buildings/empire-state-building.json Adds provenance fields to reference JSON building example source.
reference/examples/buildings/license-basic.yaml Adds provenance fields to reference licensed building example source.
reference/examples/buildings/osm/outline.yaml Adds provenance fields to reference OSM building example source.
reference/examples/buildings/osm/part1.yaml Adds provenance fields to reference OSM building part example source.
reference/examples/buildings/osm/part2.yaml Adds provenance fields to reference OSM building part example source.
reference/examples/divisions/division/capital_of.yaml Adds provenance fields to reference division example source.
reference/examples/divisions/division/class.yaml Adds provenance fields to reference division example source.
reference/examples/divisions/division/dependency.yaml Adds provenance fields to reference division example source.
reference/examples/divisions/division/multiple_capital_division.yaml Adds provenance fields to reference division example source.
reference/examples/divisions/division/population.yaml Adds provenance fields to reference division example source.
reference/examples/divisions/division/prominence.yaml Adds provenance fields to reference division example source.
reference/examples/divisions/division/region.yaml Adds provenance fields to reference division example source.
reference/examples/places/place-no-emails-phones-socials-websites.yaml Adds provenance fields to reference place example sources.
reference/examples/places/place-with-operating-status.yaml Adds provenance fields to reference place example sources.
reference/examples/places/place.yaml Adds provenance fields to reference place example sources.
reference/examples/transportation/segment/road/road-with-lr-sources.yaml Adds provenance fields to reference road source examples.
reference/counterexamples/base/bathymetry/bad-depth.yaml Adds valid provenance fields to reference bathymetry counterexample source.
reference/counterexamples/buildings/bad-confidence-in-source.yaml Adds valid provenance fields to reference building counterexample sources.
reference/counterexamples/buildings/bad-license.yaml Adds valid provenance fields to reference building counterexample source.
reference/counterexamples/buildings/bad-time.yaml Adds valid provenance fields to reference building counterexample source.
reference/counterexamples/buildings/building-part-bad-name.yaml Adds valid provenance fields to reference building part counterexample source.
reference/counterexamples/places/bad-empty-emails.yaml Adds valid provenance fields to reference place counterexample sources.
reference/counterexamples/places/bad-empty-phones.yaml Adds valid provenance fields to reference place counterexample sources.
reference/counterexamples/places/bad-empty-socials.yaml Adds valid provenance fields to reference place counterexample sources.
reference/counterexamples/places/bad-empty-websites.yaml Adds valid provenance fields to reference place counterexample sources.
reference/counterexamples/places/bad-operating-status.yaml Adds valid provenance fields to reference place counterexample sources.
reference/counterexamples/transportation/segment/bad-source-between.yaml Adds valid provenance fields to reference transportation counterexample source.
Comments suppressed due to low confidence (2)

schema/defs.yaml:270

  • The new required behavior lacks dedicated counterexamples for missing provider, resource, and version. This repo already validates counterexamples for required-field failures, so adding negative cases would protect the intended source provenance contract from regressions.
    schema/defs.yaml:284
  • The new minLength constraint is not covered by an empty-value counterexample for provider (nor by analogous cases for the other new source identifiers). Without a negative example, these constraints can be accidentally removed while the example/counterexample validation suite still passes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread schema/defs.yaml Outdated
version values that together identify a specific source snapshot.
type: object
required: [property, dataset]
required: [property, dataset, provider, resource, version]
Comment on lines +47 to +73
provider: Annotated[
str,
Field(
min_length=1,
description=textwrap.dedent("""
Name of the entity that produced the source data.
""").strip(),
),
]
resource: Annotated[
str,
Field(
min_length=1,
description=textwrap.dedent("""
Subject or data type produced by the provider.
""").strip(),
),
]
version: Annotated[
str,
Field(
min_length=1,
description=textwrap.dedent("""
Sortable source snapshot identifier, such as a date, number, or release label.
""").strip(),
),
]
Comment thread docs/schema/0-Schema.mdx
| **type** | *string* | one of 14 Overture feature types |
| **version** | *int32* | version number of the feature, incremented in each Overture release where the geometry or attributes of this feature changed |
| **sources** | *list\<element: struct\<property: string, dataset: string, record_id: string, update_time: string, confidence: double, between: list\<double\>\>\>* | array of source information for the properties of a given feature |
| **sources** | *list\<element: struct\<property: string, dataset: string, provider: string, resource: string, version: string, record_id: string, update_time: string, confidence: double, between: list\<double\>\>\>* | array of source information for the properties of a given feature |
Signed-off-by: ericgodwin <eric@overturemaps.org>
@ericgodwin
Copy link
Copy Markdown
Author

I have changed enough functionality at this point that it makes sense to just start over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change type - minor 🤏 Minor schema change. See https://lf-overturemaps.atlassian.net/wiki/x/GgDa

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update sources details to provide the version

2 participants