feat: add Iceberg v3 type definitions#752
Merged
Merged
Conversation
00c7860 to
2b09f55
Compare
WZhuo
reviewed
Jun 17, 2026
evindj
approved these changes
Jun 17, 2026
wgtmac
reviewed
Jun 18, 2026
There was a problem hiding this comment.
Pull request overview
This PR introduces Iceberg v3 type-system support by adding the new types variant, geometry, and geography (plus EdgeAlgorithm for geography), and wiring them through the existing visitor/type utilities, schema/JSON parsing & serialization, and compatibility checks. Data read/write support for these types is explicitly not implemented yet (Arrow/Avro/Parquet conversions and identity transform binding return errors).
Changes:
- Add v3
TypeIds (kVariant,kGeometry,kGeography) and correspondingTypeimplementations (including CRS/edge algorithm handling and stringification). - Integrate v3 types into visitors, schema projection/utilities, transforms, and format-version gating; return
NotSupportedfor unsupported IO/conversions. - Extend and adjust unit tests to cover v3 type parsing/printing and “unsupported” behavior in conversions/transforms.
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/iceberg/util/visitor_generate.h | Extends generated visitor action lists and adds explicit dispatch for variant in “primitive default” switch. |
| src/iceberg/util/visit_type.h | Updates categorical visitor docs to include a fifth category for variant. |
| src/iceberg/util/type_util.h | Adds VariantType overloads to schema/type utility visitors. |
| src/iceberg/util/type_util.cc | Implements VariantType visitor handling and adjusts projection logic to treat non-nested leaf types consistently. |
| src/iceberg/util/struct_like_set.cc | Returns NotSupported for scalar validation of v3 types. |
| src/iceberg/update/update_schema.cc | Adds VisitVariant handling in schema-update visitor. |
| src/iceberg/type.h | Adds VariantType, GeometryType, GeographyType, factories, and edge-algorithm APIs; updates type factory group docs. |
| src/iceberg/type.cc | Implements v3 type behavior, factories, TypeId/EdgeAlgorithm string conversions and parsing. |
| src/iceberg/type_fwd.h | Adds new TypeIds, EdgeAlgorithm, and forward declarations for new types. |
| src/iceberg/transform.cc | Disables identity transform for geometry/geography. |
| src/iceberg/transform_function.cc | Enforces identity-transform input-type restrictions for geometry/geography. |
| src/iceberg/test/visit_type_test.cc | Extends type test cases to include v3 types and updates nested-vs-non-nested expectations. |
| src/iceberg/test/type_test.cc | Extends type test cases, adjusts nested checks, and adds geography default/algorithm equality tests. |
| src/iceberg/test/transform_test.cc | Adds coverage ensuring identity transform rejects v3 types. |
| src/iceberg/test/schema_test.cc | Adds schema projection test coverage for variant fields. |
| src/iceberg/test/schema_json_test.cc | Adds JSON round-trip and invalid-input tests for v3 type strings (case/spacing/algorithms). |
| src/iceberg/test/rest_json_serde_test.cc | Updates expected error message to match new “Cannot parse type string” behavior. |
| src/iceberg/test/arrow_test.cc | Adds test asserting Arrow conversion rejects v3 types. |
| src/iceberg/table_metadata.h | Gates v3 types behind Iceberg format version >= 3. |
| src/iceberg/schema_internal.cc | Refactors Arrow schema conversion to return Status, improves error reporting with type paths, and rejects v3 types explicitly. |
| src/iceberg/parquet/parquet_writer.cc | Adds VisitVariant to metrics collector visitor. |
| src/iceberg/parquet/parquet_schema_util.cc | Rejects reading v3 types from Parquet schema evolution validation. |
| src/iceberg/parquet/parquet_metrics.cc | Adds VisitVariant to metrics visitor. |
| src/iceberg/metrics_config.cc | Treats variant as a non-nested leaf for metrics field-id limiting. |
| src/iceberg/json_serde.cc | Adds JSON serialization and parsing for v3 types (including CRS and edge algorithm); normalizes primitive parsing to be case-insensitive. |
| src/iceberg/delete_file_index.cc | Adjusts equality-delete bound conversion to skip any non-primitive types (avoids mis-casting variant). |
| src/iceberg/avro/avro_schema_util.cc | Rejects writing/reading v3 types to/from Avro with NotSupported. |
| src/iceberg/avro/avro_schema_util_internal.h | Declares Avro visitor overloads for v3 types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Introduce the Iceberg v3 types (variant, geometry, geography), including their schema/JSON serialization and type-system integration (visitors, schema projection, etc.). Reading and writing data of these types is not implemented yet: conversion to/from Arrow, Avro, and Parquet returns an error, as do identity transform binding and scalar validation for them.
de22931 to
a4eb1f6
Compare
wgtmac
reviewed
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce the Iceberg v3 types (variant, geometry, geography), including their schema/JSON serialization and type-system integration (visitors, schema projection, etc.).
Reading and writing data of these types is not implemented yet: conversion to/from Arrow, Avro, and Parquet returns an error, as do identity transform binding and scalar validation for them.