Skip to content

[python] Add ReadBuilder.with_filter (predicate pushdown)#419

Merged
JingsongLi merged 3 commits into
apache:mainfrom
JunRuiLee:feat/py-read-builder-pr2
Jun 29, 2026
Merged

[python] Add ReadBuilder.with_filter (predicate pushdown)#419
JingsongLi merged 3 commits into
apache:mainfrom
JunRuiLee:feat/py-read-builder-pr2

Conversation

@JunRuiLee

Copy link
Copy Markdown
Contributor

Purpose

Second PR of exposing PyPaimon's DataFrame read path to Rust (follows #415). Refs #413.

Adds ReadBuilder.with_filter(predicate: dict) — converts a lightweight dict predicate into a Rust Predicate (resolving fields against the table schema) and pushes it into scan planning so plan() can prune splits.

table.new_read_builder().with_filter(
    {"method": "equal", "field": "id", "literals": [1]}
).new_scan().plan()

Supports the directly-translatable subset: equal/notEqual/lessThan/lessOrEqual/greaterThan/greaterOrEqual/isNull/isNotNull/in/notIn/and/or.

Out of scope (later PRs): read(splits) → Arrow data reading, pypaimon-side wiring.

Notes

  • Schema is authoritative: any index/data_type in the dict is ignored; fields resolve by name against the current table schema.
  • No partial pushdown: if any node (incl. a compound child) is unsupported, the whole with_filter fails — nothing is partially pushed.
  • Unsupported operators (like/startsWith/endsWith/contains/not) and unsupported literal types (Date/Time/Timestamp/Decimal/Bytes, complex) → NotImplementedError. Unknown field / wrong literal count / None misuse / type mismatch / empty compound children → ValueError.
  • Filter pushdown is conservative: successful conversion does not guarantee split-count reduction for every predicate; it enables read-side pruning where supported. Exact row filtering is left to a residual layer.

Tests

bindings/python/tests/test_read.py — filter conversion (eq/and/or/in/isNull/bool), partition pruning, and error paths (unsupported op/type, unknown field, wrong count, empty children).

Note on CI

CI will be red until #418 merges: current main does not compile (DataType::Vector not covered in arrow/format/row.rs, unrelated to this PR). This PR is built on main and will go green once #418 lands; happy to rebase then.

@JunRuiLee JunRuiLee force-pushed the feat/py-read-builder-pr2 branch from 55da015 to 79c2096 Compare June 29, 2026 03:07

@JingsongLi JingsongLi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 209e4e1 into apache:main Jun 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants