ORCA: support column-level COLLATE "C" collation propagation#1649
Draft
yjhjstz wants to merge 6 commits intoapache:mainfrom
Draft
ORCA: support column-level COLLATE "C" collation propagation#1649yjhjstz wants to merge 6 commits intoapache:mainfrom
yjhjstz wants to merge 6 commits intoapache:mainfrom
Conversation
This reverts commit f979799.
Propagate column-level C collation through the full ORCA pipeline so that sort, comparison, and aggregate operations produce correct results matching the PostgreSQL planner. Key changes: 1. Query->DXL (CTranslatorUtils::GetTableDescr): pass md_col->Collation() when creating CDXLColDescr for table columns - this was the root cause of collation being lost at the very start of the pipeline. 2. Expr->DXL (CTranslatorExprToDXL::MakeDXLTableDescr, PdxlnCTAS): pass CColumnDescriptor collation to CDXLColDescr. 3. DXL->PlStmt (CMappingColIdVarPlStmt::VarFromDXLNodeScId): when the CDXLScalarIdent has no explicit collation (e.g., partial aggregate output columns), fall back to the child TargetEntry expression's collation. This fixes Finalize Aggregate inheriting the correct collation from Partial Aggregate. 4. Aggregate collation (CTranslatorDXLToScalar): set aggcollid from inputcollid instead of TypeCollation, so min/max/string_agg use the correct column collation. 5. Expression-level COLLATE "C" fallback (walkers.c): detect when RelabelType overrides collation (from fold_constants converting CollateExpr to RelabelType) and trigger fallback to PostgreSQL planner, since ORCA does not yet handle expression-level COLLATE. The existing collation infrastructure (CMDColumn, CColumnDescriptor, CColRef, CDXLColRef, CDXLScalarIdent) was already in place but the entry point in CTranslatorUtils was not passing collation, causing all downstream stages to see collation=0. fix regress
…norderbyop) ORCA does not support amcanorderbyop (KNN ordered index scans). Queries like `ORDER BY col <-> 'value' LIMIT N` on GiST indexes cannot produce ordered index scans in ORCA, resulting in inefficient Seq Scan + Sort plans instead of KNN-GiST Index Scan. Previously, these queries would accidentally get correct plans because column-level COLLATE "C" caused a blanket fallback to the PostgreSQL planner, which does support amcanorderbyop. After commit 3f4ce85 added COLLATE "C" support to ORCA, these queries lost their fallback path. Add has_orderby_ordering_op() in walkers.c to detect when a query's ORDER BY clause contains an operator registered as AMOP_ORDER in pg_amop (e.g., <-> for trigram/point distance). When detected, ORCA falls back to the PostgreSQL planner which can generate KNN ordered index scans. The check is precise: only ORDER BY with ordering operators triggers fallback. Other queries on the same tables (WHERE with LIKE/%%, equality filters, etc.) continue to use ORCA normally.
Only fall back to the PostgreSQL planner when ALL ordering-operator expressions in ORDER BY have at least one direct Var (column reference) argument. Expressions like "circle(p,1) <-> point(0,0)" wrap the column in a function call, which can cause "lossy distance functions are not supported in index-only scans" errors in the planner. Leave such queries for ORCA to handle via Seq Scan + Sort.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ORCA lost column-level collation at the Query→DXL entry point
(
CTranslatorUtils::GetTableDescrdid not passmd_col->Collation()),causing all downstream DXL nodes to see collation=0. This made
ORDER BY, comparison, GROUP BY, and aggregates on
COLLATE "C"columnsproduce wrong results (using en_US locale order instead of byte order).
Fixes #717
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions