Skip to content

[WIP] Support for 'time' type in Iceberg, reading only#1546

Open
ianton-ru wants to merge 9 commits intoantalya-25.8from
bugfix/antalya-25.8/1535_time_type_read_support_1
Open

[WIP] Support for 'time' type in Iceberg, reading only#1546
ianton-ru wants to merge 9 commits intoantalya-25.8from
bugfix/antalya-25.8/1535_time_type_read_support_1

Conversation

@ianton-ru
Copy link

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Solved #1535
Support for 'time' type in Iceberg, reading only

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Mar 18, 2026

Workflow [PR], commit [0c56e3c]

@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94a1346de1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +150 to 152
case TypeIndex::Time:
case TypeIndex::Time64:
return dumpValue(field.safeGet<Decimal64>().getValue().value);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize Time stats using Int64 field type

canDumpIcebergStats now allows TypeIndex::Time, but dumpFieldToBytes handles that branch with field.safeGet<Decimal64>(); Time values are stored in Field as Int64, so this path throws BAD_GET when lower/upper bounds are written. In practice, inserts into Iceberg tables that include a Time column can now fail during manifest statistics serialization instead of just skipping stats for that type.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37e67a8195

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +148 to 150
case TypeIndex::Time:
case TypeIndex::Time64:
return dumpValue(field.safeGet<Int64>());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize Time64 stats via decimal payload

TypeIndex::Time64 values are stored as decimal fields, but this branch reads them with field.safeGet<Int64>(). Since canDumpIcebergStats() now explicitly allows Time64, any Iceberg write that emits stats for a Time64 column can fail at runtime with a field type mismatch when building lower/upper bounds. This should use the same decimal extraction path as DateTime64 (or an equivalent conversion) instead of the integer accessor.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the bugfix/antalya-25.8/1535_time_type_read_support_1 branch from 37e67a8 to 0c56e3c Compare March 19, 2026 16:49
@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c56e3c5b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +148 to 152
case TypeIndex::Time:
return dumpValue(field.safeGet<Int64>());
case TypeIndex::Time64:
case TypeIndex::DateTime64:
return dumpValue(field.safeGet<Decimal64>().getValue().value);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize Iceberg time bounds in microseconds

dumpFieldToBytes now emits raw ClickHouse storage units for Time/Time64 into manifest lower_bounds/upper_bounds, but Iceberg time bounds are defined as microseconds-from-midnight. Here Time is written as integer seconds and Time64 is written as raw decimal ticks, so values are mis-scaled unless they are already microseconds; e.g. Time and Time64(3) produce smaller bounds by 1e6/1e3. Any reader that trusts bounds for predicate pruning can incorrectly skip matching files and return incomplete results. Please rescale to microseconds (or skip writing bounds for non-microsecond scales) before serializing.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant