Skip to content

[client-v2] readVariant discards the active type discriminant — colliding Variant alternatives are indistinguishable #2901

Description

@claude

Description

BinaryStreamReader.readVariant(ClickHouseColumn) reads the 1-byte Variant discriminant (ordNum) only to select which nested column's reader to run, then returns just the decoded value and throws the discriminant away:

// client-v2/src/main/java/com/clickhouse/client/api/data_formats/internal/BinaryStreamReader.java:892
public Object readVariant(ClickHouseColumn column) throws IOException {
    int ordNum = readByte() & 0xFF;
    if (ordNum == 0xFF) {
        return null;
    }
    return readValue(column.getNestedColumns().get(ordNum)); // ordNum (the active type) is discarded
}

For many Variants the active alternative can be inferred from the returned object's runtime class, but for alternatives whose readers produce the same Java runtime type the information is lost, and the consumer cannot recover which ClickHouse subtype was actually on the wire. Examples where the decoded objects are indistinguishable:

Variant both alternatives decode to why ambiguous
Variant(DateTime, DateTime64(3)) java.time.ZonedDateTime (lines 186-191 → convertDateTime, typeHint == null → returns ZonedDateTime) same class, same instant
Variant(String, FixedString(N)) java.lang.String same class
Variant(Decimal32(s), Decimal64(s)) java.math.BigDecimal same class/scale

Because readVariant returns a bare Object, a downstream consumer (text/JSON renderer, type-aware formatter, or any caller inspecting the value) has no way to choose the correct subtype for formatting in these cases.

This is the Java analogue of ClickHouse/clickhouse-js#910. Note the specific examples in that JS issue do not collide here — clickhouse-java maps Enum8/Enum16 to a dedicated EnumValue (lines 172-181) and Date to LocalDate (line 183), so Variant(UInt8, Enum8(...)) and Variant(Date, DateTime) are recoverable from the runtime type. The underlying limitation — the discriminant index is not surfaced — is the same, and other type pairs (above) hit it.

ClickHouse server version

26.6.1.1193 (collision behavior determined by code analysis of the type→class mapping; not exercised end-to-end against the server).

Reproduction

A Variant(DateTime, DateTime64(3)) column: whichever alternative the server picks, readVariant returns a ZonedDateTime, so the caller cannot tell DateTime from DateTime64.

// Conceptual unit-level repro against BinaryStreamReader
ClickHouseColumn col = ClickHouseColumn.of("v", "Variant(DateTime, DateTime64(3))");

// Wire bytes: discriminant 0x01 selects the DateTime64 alternative (alternatives are
// sorted by ClickHouse's global type-name ordering), followed by its encoded value.
byte[] payload = /* 0x01 ++ encoded DateTime64 value */;
BinaryStreamReader reader = new BinaryStreamReader(
        new ByteArrayInputStream(payload), null, LZ4_FACTORY, null);

Object value = reader.readVariant(col);

// EXPECTED: caller can determine the active alternative was DateTime64 (index 1)
// ACTUAL:   value is a java.time.ZonedDateTime, identical to what the DateTime
//           alternative (index 0) would have produced — the active type is unrecoverable.
assertTrue(value instanceof ZonedDateTime); // passes for BOTH alternatives

The same happens through the high-level reader path (readValuecase Variant at line 248-250): reading a Variant(String, FixedString(10)) column yields a String with no indication of which alternative it was.

Suggested fix

Surface the active alternative index alongside the value rather than discarding it — mirroring the upstream proposal. Options:

  • Return a small wrapper, e.g. VariantValue { int typeIndex; Object value; }, from readVariant (breaking change to the public Object readVariant(...) signature — gate behind a minor/major bump), or
  • Expose the resolved subtype (column.getNestedColumns().get(ordNum)) so a renderer can format with the exact ClickHouseColumn.

Buggy code: BinaryStreamReader.readVariant at client-v2/src/main/java/com/clickhouse/client/api/data_formats/internal/BinaryStreamReader.java:892-898 (dispatched from readValue case Variant at line 248-250). The legacy clickhouse-data RowBinary processor should be checked for the same pattern.

Link

Relayed from ClickHouse/clickhouse-js#910.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions