Skip to content

ORCA: support column-level COLLATE "C" collation propagation#1649

Draft
yjhjstz wants to merge 6 commits intoapache:mainfrom
yjhjstz:orca_collation
Draft

ORCA: support column-level COLLATE "C" collation propagation#1649
yjhjstz wants to merge 6 commits intoapache:mainfrom
yjhjstz:orca_collation

Conversation

@yjhjstz
Copy link
Copy Markdown
Member

@yjhjstz yjhjstz commented Mar 30, 2026

ORCA lost column-level collation at the Query→DXL entry point
(CTranslatorUtils::GetTableDescr did not pass md_col->Collation()),
causing all downstream DXL nodes to see collation=0. This made
ORDER BY, comparison, GROUP BY, and aggregates on COLLATE "C" columns
produce wrong results (using en_US locale order instead of byte order).

Fixes #717

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


yjhjstz added 6 commits March 31, 2026 04:36
Propagate column-level C collation through the full ORCA pipeline so
that sort, comparison, and aggregate operations produce correct results
matching the PostgreSQL planner.

Key changes:

1. Query->DXL (CTranslatorUtils::GetTableDescr): pass md_col->Collation()
   when creating CDXLColDescr for table columns - this was the root cause
   of collation being lost at the very start of the pipeline.

2. Expr->DXL (CTranslatorExprToDXL::MakeDXLTableDescr, PdxlnCTAS): pass
   CColumnDescriptor collation to CDXLColDescr.

3. DXL->PlStmt (CMappingColIdVarPlStmt::VarFromDXLNodeScId): when the
   CDXLScalarIdent has no explicit collation (e.g., partial aggregate
   output columns), fall back to the child TargetEntry expression's
   collation. This fixes Finalize Aggregate inheriting the correct
   collation from Partial Aggregate.

4. Aggregate collation (CTranslatorDXLToScalar): set aggcollid from
   inputcollid instead of TypeCollation, so min/max/string_agg use the
   correct column collation.

5. Expression-level COLLATE "C" fallback (walkers.c): detect when
   RelabelType overrides collation (from fold_constants converting
   CollateExpr to RelabelType) and trigger fallback to PostgreSQL
   planner, since ORCA does not yet handle expression-level COLLATE.

The existing collation infrastructure (CMDColumn, CColumnDescriptor,
CColRef, CDXLColRef, CDXLScalarIdent) was already in place but the
entry point in CTranslatorUtils was not passing collation, causing
all downstream stages to see collation=0.

fix  regress
…norderbyop)

ORCA does not support amcanorderbyop (KNN ordered index scans).
Queries like `ORDER BY col <-> 'value' LIMIT N` on GiST indexes
cannot produce ordered index scans in ORCA, resulting in inefficient
Seq Scan + Sort plans instead of KNN-GiST Index Scan.

Previously, these queries would accidentally get correct plans because
column-level COLLATE "C" caused a blanket fallback to the PostgreSQL
planner, which does support amcanorderbyop. After commit 3f4ce85
added COLLATE "C" support to ORCA, these queries lost their fallback
path.

Add has_orderby_ordering_op() in walkers.c to detect when a query's
ORDER BY clause contains an operator registered as AMOP_ORDER in
pg_amop (e.g., <-> for trigram/point distance). When detected, ORCA
falls back to the PostgreSQL planner which can generate KNN ordered
index scans.

The check is precise: only ORDER BY with ordering operators triggers
fallback. Other queries on the same tables (WHERE with LIKE/%%,
equality filters, etc.) continue to use ORCA normally.
Only fall back to the PostgreSQL planner when ALL ordering-operator
expressions in ORDER BY have at least one direct Var (column reference)
argument.  Expressions like "circle(p,1) <-> point(0,0)" wrap the
column in a function call, which can cause "lossy distance functions
are not supported in index-only scans" errors in the planner.  Leave
such queries for ORCA to handle via Seq Scan + Sort.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ORCA fallbacks for collate "C"

1 participant