Skip to content

feat: add Iceberg v3 type definitions#752

Merged
wgtmac merged 5 commits into
apache:mainfrom
zhjwpku:add-iceberg-v3-types
Jun 23, 2026
Merged

feat: add Iceberg v3 type definitions#752
wgtmac merged 5 commits into
apache:mainfrom
zhjwpku:add-iceberg-v3-types

Conversation

@zhjwpku

@zhjwpku zhjwpku commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Introduce the Iceberg v3 types (variant, geometry, geography), including their schema/JSON serialization and type-system integration (visitors, schema projection, etc.).

Reading and writing data of these types is not implemented yet: conversion to/from Arrow, Avro, and Parquet returns an error, as do identity transform binding and scalar validation for them.

@zhjwpku zhjwpku force-pushed the add-iceberg-v3-types branch 2 times, most recently from 00c7860 to 2b09f55 Compare June 16, 2026 16:54
Comment thread src/iceberg/json_serde.cc Outdated

@WZhuo WZhuo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/iceberg/type.h Outdated
Comment thread src/iceberg/type.h Outdated
Comment thread src/iceberg/type.h Outdated
Comment thread src/iceberg/type.h Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Iceberg v3 type-system support by adding the new types variant, geometry, and geography (plus EdgeAlgorithm for geography), and wiring them through the existing visitor/type utilities, schema/JSON parsing & serialization, and compatibility checks. Data read/write support for these types is explicitly not implemented yet (Arrow/Avro/Parquet conversions and identity transform binding return errors).

Changes:

  • Add v3 TypeIds (kVariant, kGeometry, kGeography) and corresponding Type implementations (including CRS/edge algorithm handling and stringification).
  • Integrate v3 types into visitors, schema projection/utilities, transforms, and format-version gating; return NotSupported for unsupported IO/conversions.
  • Extend and adjust unit tests to cover v3 type parsing/printing and “unsupported” behavior in conversions/transforms.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/iceberg/util/visitor_generate.h Extends generated visitor action lists and adds explicit dispatch for variant in “primitive default” switch.
src/iceberg/util/visit_type.h Updates categorical visitor docs to include a fifth category for variant.
src/iceberg/util/type_util.h Adds VariantType overloads to schema/type utility visitors.
src/iceberg/util/type_util.cc Implements VariantType visitor handling and adjusts projection logic to treat non-nested leaf types consistently.
src/iceberg/util/struct_like_set.cc Returns NotSupported for scalar validation of v3 types.
src/iceberg/update/update_schema.cc Adds VisitVariant handling in schema-update visitor.
src/iceberg/type.h Adds VariantType, GeometryType, GeographyType, factories, and edge-algorithm APIs; updates type factory group docs.
src/iceberg/type.cc Implements v3 type behavior, factories, TypeId/EdgeAlgorithm string conversions and parsing.
src/iceberg/type_fwd.h Adds new TypeIds, EdgeAlgorithm, and forward declarations for new types.
src/iceberg/transform.cc Disables identity transform for geometry/geography.
src/iceberg/transform_function.cc Enforces identity-transform input-type restrictions for geometry/geography.
src/iceberg/test/visit_type_test.cc Extends type test cases to include v3 types and updates nested-vs-non-nested expectations.
src/iceberg/test/type_test.cc Extends type test cases, adjusts nested checks, and adds geography default/algorithm equality tests.
src/iceberg/test/transform_test.cc Adds coverage ensuring identity transform rejects v3 types.
src/iceberg/test/schema_test.cc Adds schema projection test coverage for variant fields.
src/iceberg/test/schema_json_test.cc Adds JSON round-trip and invalid-input tests for v3 type strings (case/spacing/algorithms).
src/iceberg/test/rest_json_serde_test.cc Updates expected error message to match new “Cannot parse type string” behavior.
src/iceberg/test/arrow_test.cc Adds test asserting Arrow conversion rejects v3 types.
src/iceberg/table_metadata.h Gates v3 types behind Iceberg format version >= 3.
src/iceberg/schema_internal.cc Refactors Arrow schema conversion to return Status, improves error reporting with type paths, and rejects v3 types explicitly.
src/iceberg/parquet/parquet_writer.cc Adds VisitVariant to metrics collector visitor.
src/iceberg/parquet/parquet_schema_util.cc Rejects reading v3 types from Parquet schema evolution validation.
src/iceberg/parquet/parquet_metrics.cc Adds VisitVariant to metrics visitor.
src/iceberg/metrics_config.cc Treats variant as a non-nested leaf for metrics field-id limiting.
src/iceberg/json_serde.cc Adds JSON serialization and parsing for v3 types (including CRS and edge algorithm); normalizes primitive parsing to be case-insensitive.
src/iceberg/delete_file_index.cc Adjusts equality-delete bound conversion to skip any non-primitive types (avoids mis-casting variant).
src/iceberg/avro/avro_schema_util.cc Rejects writing/reading v3 types to/from Avro with NotSupported.
src/iceberg/avro/avro_schema_util_internal.h Declares Avro visitor overloads for v3 types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/iceberg/type.h
Comment thread src/iceberg/type.h
Comment thread src/iceberg/type.h
Comment thread src/iceberg/type.h
Comment thread src/iceberg/type.h
Comment thread src/iceberg/type.h
Comment thread src/iceberg/type.h
Comment thread src/iceberg/type.h
Comment thread src/iceberg/test/type_test.cc Outdated
Comment thread src/iceberg/test/visit_type_test.cc Outdated
@zhjwpku zhjwpku added the ready to merge This PR has been approved and it is ready to merge. label Jun 22, 2026
zhjwpku and others added 5 commits June 22, 2026 23:06
Introduce the Iceberg v3 types (variant, geometry, geography), including
their schema/JSON serialization and type-system integration (visitors,
schema projection, etc.).

Reading and writing data of these types is not implemented yet: conversion
to/from Arrow, Avro, and Parquet returns an error, as do identity transform
binding and scalar validation for them.
@wgtmac wgtmac force-pushed the add-iceberg-v3-types branch from de22931 to a4eb1f6 Compare June 23, 2026 03:35
Comment thread src/iceberg/type.cc Outdated
Comment thread src/iceberg/json_serde.cc Outdated
Comment thread src/iceberg/json_serde.cc Outdated
Comment thread src/iceberg/json_serde.cc Outdated
Comment thread src/iceberg/json_serde.cc Outdated
Comment thread src/iceberg/transform_function.cc Outdated

@wgtmac wgtmac left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhjwpku for adding this and @evindj @WZhuo for the review!

@wgtmac wgtmac removed the ready to merge This PR has been approved and it is ready to merge. label Jun 23, 2026
@wgtmac wgtmac merged commit 959fda6 into apache:main Jun 23, 2026
21 checks passed
@zhjwpku zhjwpku deleted the add-iceberg-v3-types branch June 23, 2026 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants