Search before asking
Description
Motivation
Apache Paimon has a first-class VECTOR type (VECTOR<element, length>), and
tables written by Java/Python Paimon can contain VECTOR columns. paimon-rust
has no VectorType today, so it cannot read these tables.
In Paimon, VectorType is a general-purpose column type (it can be read,
projected, and written like any other column); it just happens to also be the
natural input to vector (ANN) search. So the goal here is full support for the
type, not only the read path.
Goal: give paimon-rust first-class support for Paimon VECTOR columns —
read, write, and vector (ANN) search — matching Java/Python Paimon.
Scope (incremental PRs)
Read foundation:
- PR 1 — Add native
VectorType to the type system (variant, validation,
JSON serde, Display).
- PR 2 — Arrow conversion (
VectorType ↔ FixedSizeList) + read vector
columns inlined in ordinary data files.
- PR 3 — Read dedicated
.vector. files (parquet/vortex).
Search & write:
- PR 4 — Integrate
VECTOR<FLOAT> with the existing vector (ANN) search
path. Today lumina/vindex index build and VectorSearchBuilder only
accept ARRAY<FLOAT>; extend them to also accept DataType::Vector, deriving
the dimension from the column type (VectorType::length) instead of requiring
an explicit dimension option. Matches the Java behavior where both
ARRAY<FLOAT> and VECTOR<FLOAT> are valid vector-search columns.
- PR 5+ — Writing
VECTOR columns (inline + dedicated .vector. files).
Willingness to contribute
Search before asking
Description
Motivation
Apache Paimon has a first-class
VECTORtype (VECTOR<element, length>), andtables written by Java/Python Paimon can contain
VECTORcolumns. paimon-rusthas no
VectorTypetoday, so it cannot read these tables.In Paimon,
VectorTypeis a general-purpose column type (it can be read,projected, and written like any other column); it just happens to also be the
natural input to vector (ANN) search. So the goal here is full support for the
type, not only the read path.
Goal: give paimon-rust first-class support for Paimon
VECTORcolumns —read, write, and vector (ANN) search — matching Java/Python Paimon.
Scope (incremental PRs)
Read foundation:
VectorTypeto the type system (variant, validation,JSON serde,
Display).VectorType↔FixedSizeList) + read vectorcolumns inlined in ordinary data files.
.vector.files (parquet/vortex).Search & write:
VECTOR<FLOAT>with the existing vector (ANN) searchpath. Today
lumina/vindexindex build andVectorSearchBuilderonlyaccept
ARRAY<FLOAT>; extend them to also acceptDataType::Vector, derivingthe dimension from the column type (
VectorType::length) instead of requiringan explicit
dimensionoption. Matches the Java behavior where bothARRAY<FLOAT>andVECTOR<FLOAT>are valid vector-search columns.VECTORcolumns (inline + dedicated.vector.files).Willingness to contribute