Skip to content

feat: support ST_DWithin pushdown in vortex#8625

Open
HarukiMoriarty wants to merge 2 commits into
nemo/duckdb-native-geometryfrom
nemo/geo-native-pushdown
Open

feat: support ST_DWithin pushdown in vortex#8625
HarukiMoriarty wants to merge 2 commits into
nemo/duckdb-native-geometryfrom
nemo/geo-native-pushdown

Conversation

@HarukiMoriarty

Copy link
Copy Markdown
Contributor

Summary

Insert non-throwing geo predicate vortex_dwithin in DuckDB, which later pushdown into vortex, call distance scalar function on scanning, significantly improve Q1/Q3 performance in SpatialBench.

What changes are included in this PR?

  1. adding new geo predicate vortex_dwithin in DuckDB, which is non-throwing and can be pushdown.
  2. adding SQL rewrite so for geo native type, ST_dwithin is text rewritten into vortex_dwithin.

Performance

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Query ┃ duckdb:parquet (base) ┃   duckdb:vortex ┃ duckdb:vortex-native ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ 1     │               179.6ms │ 132.1ms (0.74x) │       29.0ms (0.16x) │
│ 2     │               268.1ms │ 255.3ms (0.95x) │      312.5ms (1.17x) │
│ 3     │               247.8ms │ 216.4ms (0.87x) │      140.2ms (0.57x) │
│ 4     │               199.6ms │ 146.9ms (0.74x) │      144.5ms (0.72x) │
│ 5     │                 3.40s │   3.43s (1.01x) │        3.03s (0.89x) │
│ 6     │               528.1ms │ 359.2ms (0.68x) │      455.4ms (0.86x) │
│ 7     │               984.0ms │ 983.2ms (1.00x) │      887.7ms (0.90x) │
│ 8     │                 1.08s │ 945.4ms (0.87x) │      997.6ms (0.92x) │
│ 9     │                34.2ms │  33.5ms (0.98x) │       43.6ms (1.27x) │
└───────┴───────────────────────┴─────────────────┴──────────────────────┘

Takeaways: Q1 and Q3 is significantly improved due to the single table geo predicate is pushdown into vortex scan. Q1 is more benefit from the manually fast path without going through geo crate for calculation. Q3 is also potentially can be benefit from manually calculation.

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Comment thread vortex-duckdb/cpp/expr.cpp Outdated
match format {
Format::VortexNative => strip_wkb_wrappers(query),
// Native geometry is `GEOMETRY`: drop `ST_GeomFromWKB(..)`, route pushable `ST_DWithin`.
Format::VortexNative => route_pushable_dwithin(&strip_wkb_wrappers(query)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CREATE MACRO in init.sql is an easier approach than rewriting the query manually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the queries call ST_GeomFromWKB/ST_DWithin, which the spatial extension already defines, and you can't shadow those: CREATE MACRO ST_DWithin(...) -> Catalog Error: Macro Function with name "ST_DWithin" already exists (same for ST_GeomFromWKB, both tested).

}
const DatabaseWrapper &wrapper = *reinterpret_cast<DatabaseWrapper *>(ffi_db);
try {
Connection conn(*wrapper.database->instance);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a transaction here? Can we register the function in the global catalog without it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we cannot not. DuckDB's catalog is MVCC, so CreateFunction needs an active transaction (it throws ActiveTransaction called without active transaction otherwise). A fresh Connection only begins a transaction per query, and we're calling the catalog API directly, so RunFunctionInTransaction is just the "begin → run → commit" wrapper to supply one.
I confirmed by removing it: registration aborts and queries then fail with vortex_dwithin does not exist.

Comment thread vortex-duckdb/src/duckdb/database.rs
- Replace non-ASCII characters in comments with ASCII.
- Document why catalog registration needs RunFunctionInTransaction.
- Reference the FFI function in register_geo_aliases doc.

Signed-off-by: Nemo Yu <zyu379@wisc.edu>
@codspeed-hq

codspeed-hq Bot commented Jun 29, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 12.06%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 2 improved benchmarks
✅ 1587 untouched benchmarks
⏩ 4 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 215.3 ns +13.55%
Simulation bitwise_not_vortex_buffer_mut[1024] 304.7 ns 275.6 ns +10.58%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing nemo/geo-native-pushdown (94674e0) with nemo/duckdb-native-geometry (69a4f90)

Open in CodSpeed

Footnotes

  1. 4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants