Tested against the Rust reference implementation v0.74.0. For the rest of the API surface (reader, writer, scan, CLI), see reference.md.
| Item | Introduced | Java status |
|---|---|---|
DType::Union (fbs.DType.Type.Union = 12) |
Rust 0.71.0 | ❌ Decode throws VortexException("unsupported DType typeType=12"). No DType.Union variant in Java's sealed type. |
vortex.onpair experimental string encoding |
Rust 0.74.0 | ❌ Not registered. Files using it fail to decode unless Registry.allowUnknown() is enabled. |
vortex.variant write path |
Rust 0.73.0 (Allow writing Variant to files, #7945) |
❌ Java decode works; Java encode throws "encode not yet implemented". Java→Rust round-trip not possible for Variant columns. |
| Arrow extension array import affecting Variant shape | Rust 0.74.0 (#8125) | Untested. Re-run integration fixtures against v0.74.0 once published. |
| Encoding ID | Class | Decode | Encode | Notes |
|---|---|---|---|---|
vortex.primitive |
PrimitiveEncoding |
✅ | ✅ | All PType (I8–I64, U8–U64, F32, F64) |
vortex.bool |
BoolEncoding |
✅ | ✅ | Bool (bit-packed) |
vortex.null |
NullEncoding |
✅ | ✅ | Null |
vortex.bytebool |
ByteBoolEncoding |
✅ | ✅ | Bool (byte-per-element) |
vortex.zigzag |
ZigZagEncoding |
✅ | ✅ | Signed integer PTypes |
vortex.constant |
ConstantEncoding |
✅ | ✅ | Primitive, Utf8, Binary, Bool, Null, Decimal, Extension |
vortex.ext |
ExtEncoding |
✅ | ✅ | Extension |
vortex.runend |
RunEndEncoding |
✅ | ✅ | Primitive, Utf8/Binary, Bool |
vortex.varbin |
VarBinEncoding |
✅ | ✅ | Utf8, Binary |
vortex.varbinview |
VarBinViewEncoding |
✅ | ✅ | Utf8, Binary |
vortex.alp |
AlpEncoding |
✅ | ✅ | F64, F32 |
vortex.alprd |
AlpRdEncoding |
✅ | ✅ | F64, F32 |
vortex.dict |
DictEncoding |
✅ | ✅ | Primitive, Utf8/Binary |
vortex.sparse |
SparseEncoding |
✅ | ✅ | Primitive |
vortex.sequence |
SequenceEncoding |
✅ | ✅ | Primitive |
vortex.struct |
StructEncoding |
✅ | ✅ | Struct |
vortex.chunked |
ChunkedEncoding |
✅ | ✅ | Primitive + Struct concat |
vortex.fsst |
FsstEncoding |
✅ | ✅ | Utf8, Binary |
vortex.list |
ListEncoding |
✅ | ✅ | |
vortex.listview |
ListViewEncoding |
✅ | ✅ | |
vortex.fixed_size_list |
FixedSizeListEncoding |
✅ | ✅ | |
vortex.zstd |
ZstdEncoding |
✅ | ✅ | Primitive, Utf8, Binary |
vortex.masked |
MaskedEncoding |
✅ | ❌ | Encode not yet implemented |
vortex.decimal |
DecimalEncoding |
✅ | ✅ | |
vortex.decimal_byte_parts |
DecimalBytePartsEncoding |
✅ | ✅ | |
vortex.datetimeparts |
DateTimePartsEncoding |
✅ | ✅ | |
vortex.pco |
PcoEncoding |
✅ | ❌ | Decode: all modes; encode not yet implemented |
fastlanes.bitpacked |
BitpackedEncoding |
✅ | ✅ | Unsigned integer PTypes |
fastlanes.delta |
DeltaEncoding |
✅ | ✅ | Integer PTypes |
fastlanes.for |
FrameOfReferenceEncoding |
✅ | ✅ | Integer PTypes |
fastlanes.rle |
RleEncoding |
✅ | ✅ | Chunk-based RLE |
vortex.patched |
PatchedEncoding |
✅ | ❌ | Primitive PTypes; encode not yet implemented |
vortex.variant |
VariantEncoding |
✅ | ❌ | Decode (incl. shredded child); encode not yet implemented (Rust 0.73+) |
vortex.onpair |
none | ❌ | ❌ | Experimental in Rust 0.74.0; not yet ported |
Files containing unrecognised encoding IDs throw VortexException by default. Opt in to
passthrough mode to read such files without failing:
Registry registry = Registry.builder()
.registerServiceLoaded()
.allowUnknown()
.build();
try (VortexReader vf = VortexReader.open(path, registry)) {
// columns with unknown encodings are returned as UnknownArray
}Extension dtypes wrap a primitive storage array with a logical-id tag plus optional
metadata. The Rust catalogue lives in
vortex-array/src/extension/;
each subdir below names a canonical extension id and its on-disk shape.
Extensions live in io.github.dfa1.vortex.extension. Each spec extension is a
singleton implementing the Extension interface, with typed encode/decode
methods on the concrete impl. Resolve a column to its impl via
Registry.lookup(ExtensionId), or grab the singleton directly:
DType.Extension dtype = (DType.Extension) schema.field("birthdays");
List<LocalDate> values = DateExtension.INSTANCE.decodeAll(chunk.column("birthdays"));End-to-end round-trip — write a List<LocalDate>, read it back:
var schema = new DType.Struct(List.of("birthdays"),
List.of(DateExtension.INSTANCE.dtype(false)), false);
writer.writeChunk(Map.of("birthdays", dates)); // Collection auto-routed
try (var iter = reader.scan(ScanOptions.all());
Chunk chunk = iter.next()) {
List<LocalDate> back = chunk.as("birthdays", LocalDate.class);
}Chunk.as(name, Class) hides the per-extension decode dispatch for the four
spec extensions (LocalDate ↔ vortex.date, LocalTime ↔ vortex.time,
Instant ↔ vortex.timestamp, UUID ↔ vortex.uuid). Third-party
extensions still go through Registry.lookup(ExtensionId) and the impl's own
typed methods.
ExtensionId is the enum of known spec ids (VORTEX_DATE, VORTEX_TIME,
VORTEX_TIMESTAMP, VORTEX_UUID). Unknown wire ids on DType.Extension
round-trip verbatim through the raw String field — the registry simply
returns null for them and callers can read the storage column directly.
| Extension id | Impl | Storage | Metadata | Round-trip |
|---|---|---|---|---|
vortex.date |
DateExtension |
Signed integer days since 1970-01-01 | none | ✅ |
vortex.time |
TimeExtension |
I32 (s/ms) or I64 (μs/ns) since midnight | 1 byte: TimeUnit |
✅ |
vortex.timestamp |
TimestampExtension |
I64 epoch count in the recorded TimeUnit |
unit byte + u16 LE tz_len + UTF-8 tz | ✅ |
vortex.uuid |
UuidExtension |
FixedSizeList(Primitive(U8), 16) |
none | ✅ |
| custom ids | none | whatever the column declares | opaque bytes | passthrough |
TimeUnit (see extension/datetime/unit.rs)
encodes precision in the first metadata byte:
| Value | Unit |
|---|---|
| 0 | Nanoseconds |
| 1 | Microseconds |
| 2 | Milliseconds |
| 3 | Seconds |
| 4 | Days |
For unsupported extension ids the inspector falls back to a placeholder cell
(<GenericArray ext<vortex.X>>); the underlying storage array still decodes
correctly via the primitive accessors, callers just have to format the value
themselves.
Note: the fixture matrix below is locked to
v0.72.0/. The Rust reference is now atv0.74.0; re-run the integration suite againstv0.74.0/arrays/once upstream publishes the corresponding fixture set, and refresh this section.
Cross-language round-trips tested against Rust-written fixture files hosted at
s3://vortex-compat-fixtures/v0.72.0/arrays/.
| Fixture | Status |
|---|---|
primitives.vortex |
✅ |
alp.vortex |
✅ |
bitpacked.vortex |
✅ |
booleans.vortex |
✅ |
constant.vortex |
✅ |
for.vortex |
✅ |
fsst.vortex |
✅ |
runend.vortex |
✅ |
sequence.vortex |
✅ |
varbin.vortex |
✅ |
struct_nested.vortex |
✅ |
null.vortex |
✅ |
bytebool.vortex |
✅ |
zigzag.vortex |
✅ |
datetime.vortex |
✅ |
dict.vortex |
✅ |
sparse.vortex |
✅ |
varbinview.vortex |
✅ |
chunked.vortex |
✅ |
rle.vortex |
✅ |
alprd.vortex |
✅ |
decimal.vortex |
✅ |
decimal_byte_parts.vortex |
✅ |
datetimeparts.vortex |
✅ |
list.vortex |
✅ |
listview.vortex |
✅ |
fixed_size_list.vortex |
✅ |
zstd.vortex |
✅ |
tpch_lineitem.compact.vortex |
✅ |
tpch_lineitem.regular.vortex |
✅ |
tpch_orders.compact.vortex |
✅ |
tpch_orders.regular.vortex |
✅ |
pco.vortex |
✅ |
clickbench_hits_5k.compact.vortex |
✅ |
clickbench_hits_5k.regular.vortex |
✅ |
masked.vortex |
❓ |
patched.vortex |
❓ |
variant.vortex |
❓ |