Skip to content

Support reading Paimon VECTOR type in paimon-rust #410

Description

@JunRuiLee

Search before asking

  • I searched in the issues and found nothing similar.

Description

Motivation

Apache Paimon has a first-class VECTOR type (VECTOR<element, length>), and
tables written by Java/Python Paimon can contain VECTOR columns. paimon-rust
has no VectorType today, so it cannot read these tables.

In Paimon, VectorType is a general-purpose column type (it can be read,
projected, and written like any other column); it just happens to also be the
natural input to vector (ANN) search. So the goal here is full support for the
type, not only the read path.

Goal: give paimon-rust first-class support for Paimon VECTOR columns —
read, write, and vector (ANN) search — matching Java/Python Paimon.

Scope (incremental PRs)

Read foundation:

  • PR 1 — Add native VectorType to the type system (variant, validation,
    JSON serde, Display).
  • PR 2 — Arrow conversion (VectorTypeFixedSizeList) + read vector
    columns inlined in ordinary data files.
  • PR 3 — Read dedicated .vector. files (parquet/vortex).

Search & write:

  • PR 4 — Integrate VECTOR<FLOAT> with the existing vector (ANN) search
    path. Today lumina/vindex index build and VectorSearchBuilder only
    accept ARRAY<FLOAT>; extend them to also accept DataType::Vector, deriving
    the dimension from the column type (VectorType::length) instead of requiring
    an explicit dimension option. Matches the Java behavior where both
    ARRAY<FLOAT> and VECTOR<FLOAT> are valid vector-search columns.
  • PR 5+ — Writing VECTOR columns (inline + dedicated .vector. files).

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions