[WIP] Support for 'time' type in Iceberg, reading only#1546
[WIP] Support for 'time' type in Iceberg, reading only#1546ianton-ru wants to merge 9 commits intoantalya-25.8from
Conversation
This reverts commit ddd34ac.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94a1346de1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case TypeIndex::Time: | ||
| case TypeIndex::Time64: | ||
| return dumpValue(field.safeGet<Decimal64>().getValue().value); |
There was a problem hiding this comment.
Serialize Time stats using Int64 field type
canDumpIcebergStats now allows TypeIndex::Time, but dumpFieldToBytes handles that branch with field.safeGet<Decimal64>(); Time values are stored in Field as Int64, so this path throws BAD_GET when lower/upper bounds are written. In practice, inserts into Iceberg tables that include a Time column can now fail during manifest statistics serialization instead of just skipping stats for that type.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37e67a8195
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case TypeIndex::Time: | ||
| case TypeIndex::Time64: | ||
| return dumpValue(field.safeGet<Int64>()); |
There was a problem hiding this comment.
Serialize Time64 stats via decimal payload
TypeIndex::Time64 values are stored as decimal fields, but this branch reads them with field.safeGet<Int64>(). Since canDumpIcebergStats() now explicitly allows Time64, any Iceberg write that emits stats for a Time64 column can fail at runtime with a field type mismatch when building lower/upper bounds. This should use the same decimal extraction path as DateTime64 (or an equivalent conversion) instead of the integer accessor.
Useful? React with 👍 / 👎.
37e67a8 to
0c56e3c
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c56e3c5b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case TypeIndex::Time: | ||
| return dumpValue(field.safeGet<Int64>()); | ||
| case TypeIndex::Time64: | ||
| case TypeIndex::DateTime64: | ||
| return dumpValue(field.safeGet<Decimal64>().getValue().value); |
There was a problem hiding this comment.
Serialize Iceberg time bounds in microseconds
dumpFieldToBytes now emits raw ClickHouse storage units for Time/Time64 into manifest lower_bounds/upper_bounds, but Iceberg time bounds are defined as microseconds-from-midnight. Here Time is written as integer seconds and Time64 is written as raw decimal ticks, so values are mis-scaled unless they are already microseconds; e.g. Time and Time64(3) produce smaller bounds by 1e6/1e3. Any reader that trusts bounds for predicate pruning can incorrectly skip matching files and return incomplete results. Please rescale to microseconds (or skip writing bounds for non-microsecond scales) before serializing.
Useful? React with 👍 / 👎.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Solved #1535
Support for 'time' type in Iceberg, reading only
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: