ORCA: support column-level COLLATE "C" collation propagation by yjhjstz · Pull Request #1649 · apache/cloudberry

yjhjstz · 2026-03-30T20:40:29Z

ORCA lost column-level collation at the Query→DXL entry point
(CTranslatorUtils::GetTableDescr did not pass md_col->Collation()),
causing all downstream DXL nodes to see collation=0. This made
ORDER BY, comparison, GROUP BY, and aggregates on COLLATE "C" columns
produce wrong results (using en_US locale order instead of byte order).

Fixes #717

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

This reverts commit f979799.

Propagate column-level C collation through the full ORCA pipeline so that sort, comparison, and aggregate operations produce correct results matching the PostgreSQL planner. Key changes: 1. Query->DXL (CTranslatorUtils::GetTableDescr): pass md_col->Collation() when creating CDXLColDescr for table columns - this was the root cause of collation being lost at the very start of the pipeline. 2. Expr->DXL (CTranslatorExprToDXL::MakeDXLTableDescr, PdxlnCTAS): pass CColumnDescriptor collation to CDXLColDescr. 3. DXL->PlStmt (CMappingColIdVarPlStmt::VarFromDXLNodeScId): when the CDXLScalarIdent has no explicit collation (e.g., partial aggregate output columns), fall back to the child TargetEntry expression's collation. This fixes Finalize Aggregate inheriting the correct collation from Partial Aggregate. 4. Aggregate collation (CTranslatorDXLToScalar): set aggcollid from inputcollid instead of TypeCollation, so min/max/string_agg use the correct column collation. 5. Expression-level COLLATE "C" fallback (walkers.c): detect when RelabelType overrides collation (from fold_constants converting CollateExpr to RelabelType) and trigger fallback to PostgreSQL planner, since ORCA does not yet handle expression-level COLLATE. The existing collation infrastructure (CMDColumn, CColumnDescriptor, CColRef, CDXLColRef, CDXLScalarIdent) was already in place but the entry point in CTranslatorUtils was not passing collation, causing all downstream stages to see collation=0. fix regress

…norderbyop) ORCA does not support amcanorderbyop (KNN ordered index scans). Queries like `ORDER BY col <-> 'value' LIMIT N` on GiST indexes cannot produce ordered index scans in ORCA, resulting in inefficient Seq Scan + Sort plans instead of KNN-GiST Index Scan. Previously, these queries would accidentally get correct plans because column-level COLLATE "C" caused a blanket fallback to the PostgreSQL planner, which does support amcanorderbyop. After commit 3f4ce85 added COLLATE "C" support to ORCA, these queries lost their fallback path. Add has_orderby_ordering_op() in walkers.c to detect when a query's ORDER BY clause contains an operator registered as AMOP_ORDER in pg_amop (e.g., <-> for trigram/point distance). When detected, ORCA falls back to the PostgreSQL planner which can generate KNN ordered index scans. The check is precise: only ORDER BY with ordering operators triggers fallback. Other queries on the same tables (WHERE with LIKE/%%, equality filters, etc.) continue to use ORCA normally.

Only fall back to the PostgreSQL planner when ALL ordering-operator expressions in ORDER BY have at least one direct Var (column reference) argument. Expressions like "circle(p,1) <-> point(0,0)" wrap the column in a function call, which can cause "lossy distance functions are not supported in index-only scans" errors in the planner. Leave such queries for ORCA to handle via Seq Scan + Sort.

yjhjstz added 6 commits March 31, 2026 04:36

Revert "Fix COPY TO returning 0 rows during concurrent reorganize"

3cbd3d1

This reverts commit f979799.

fix pax regress

6dc30c2

fix btree_gist test

8f7b81e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORCA: support column-level COLLATE "C" collation propagation#1649

ORCA: support column-level COLLATE "C" collation propagation#1649
yjhjstz wants to merge 6 commits intoapache:mainfrom
yjhjstz:orca_collation

yjhjstz commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yjhjstz commented Mar 30, 2026

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant