Skip to content

[fix](hive) handle ORC legacy calendar rebasing#64749

Draft
xylaaaaa wants to merge 2 commits into
apache:masterfrom
xylaaaaa:codex/orc-legacy-calendar-master
Draft

[fix](hive) handle ORC legacy calendar rebasing#64749
xylaaaaa wants to merge 2 commits into
apache:masterfrom
xylaaaaa:codex/orc-legacy-calendar-master

Conversation

@xylaaaaa

Copy link
Copy Markdown
Contributor

Summary

  • Set Doris ORC row reader target calendar to proleptic Gregorian.
  • Update contrib/apache-orc submodule to include ORC legacy hybrid calendar conversion for DATE/TIMESTAMP.
  • Keep DATE/TIMESTAMP predicate pushdown safe for legacy ORC files.

Dependency

Test Plan

  • git diff --check
  • git -C contrib/apache-orc diff --check
  • cmake --build contrib/apache-orc/build_codex_orc_calendar_tp2 --target orc -j4
  • contrib/apache-orc/build_codex_orc_calendar_tp2/verify_orc_calendar
  • contrib/apache-orc/build_codex_orc_calendar_tp2/verify_orc_predicate_calendar
  • contrib/apache-orc/build_codex_orc_calendar_tp2/verify_orc_writer_calendar
  • Built FE and BE locally in this worktree
  • Started local FE/BE and queried Hive legacy ORC table through Hive Catalog; returned 0002-01-01, 1500-01-01, 1582-10-04, 1582-11-04, 2000-02-29 with .123000 timestamps
  • Verified predicate query for DATE '0002-01-01', DATE '1500-01-01', DATE '1582-10-04'

Notes

  • Full ORC unit test target is currently blocked by an existing unrelated TestBloomFilter private-member compile issue in the thirdparty branch.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Avoid loading every Hive bootstrap dataset for both Hive2 and Hive3 in external regress. Add bootstrap group manifests so each Hive version only prepares and loads its version-specific data while keeping shared assets in the common group.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - bash -n docker/thirdparties/docker-compose/hive/scripts/bootstrap/bootstrap-groups.sh
    - bash -n docker/thirdparties/docker-compose/hive/scripts/prepare-hive-data.sh
    - bash -n docker/thirdparties/docker-compose/hive/scripts/hive-metastore.sh
    - bash -n docker/thirdparties/run-thirdparties-docker.sh
    - bash -n regression-test/pipeline/common/get-hive-bootstrap-groups.sh
    - helper smoke test for hive2/hive3 bootstrap group selection
- Behavior changed: Yes (Hive bootstrap now skips version-specific data for the other Hive version)
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants