Description
BinaryStreamReader.readVariant(ClickHouseColumn) reads the 1-byte Variant discriminant (ordNum) only to select which nested column's reader to run, then returns just the decoded value and throws the discriminant away:
// client-v2/src/main/java/com/clickhouse/client/api/data_formats/internal/BinaryStreamReader.java:892
public Object readVariant(ClickHouseColumn column) throws IOException {
int ordNum = readByte() & 0xFF;
if (ordNum == 0xFF) {
return null;
}
return readValue(column.getNestedColumns().get(ordNum)); // ordNum (the active type) is discarded
}
For many Variants the active alternative can be inferred from the returned object's runtime class, but for alternatives whose readers produce the same Java runtime type the information is lost, and the consumer cannot recover which ClickHouse subtype was actually on the wire. Examples where the decoded objects are indistinguishable:
| Variant |
both alternatives decode to |
why ambiguous |
Variant(DateTime, DateTime64(3)) |
java.time.ZonedDateTime (lines 186-191 → convertDateTime, typeHint == null → returns ZonedDateTime) |
same class, same instant |
Variant(String, FixedString(N)) |
java.lang.String |
same class |
Variant(Decimal32(s), Decimal64(s)) |
java.math.BigDecimal |
same class/scale |
Because readVariant returns a bare Object, a downstream consumer (text/JSON renderer, type-aware formatter, or any caller inspecting the value) has no way to choose the correct subtype for formatting in these cases.
This is the Java analogue of ClickHouse/clickhouse-js#910. Note the specific examples in that JS issue do not collide here — clickhouse-java maps Enum8/Enum16 to a dedicated EnumValue (lines 172-181) and Date to LocalDate (line 183), so Variant(UInt8, Enum8(...)) and Variant(Date, DateTime) are recoverable from the runtime type. The underlying limitation — the discriminant index is not surfaced — is the same, and other type pairs (above) hit it.
ClickHouse server version
26.6.1.1193 (collision behavior determined by code analysis of the type→class mapping; not exercised end-to-end against the server).
Reproduction
A Variant(DateTime, DateTime64(3)) column: whichever alternative the server picks, readVariant returns a ZonedDateTime, so the caller cannot tell DateTime from DateTime64.
// Conceptual unit-level repro against BinaryStreamReader
ClickHouseColumn col = ClickHouseColumn.of("v", "Variant(DateTime, DateTime64(3))");
// Wire bytes: discriminant 0x01 selects the DateTime64 alternative (alternatives are
// sorted by ClickHouse's global type-name ordering), followed by its encoded value.
byte[] payload = /* 0x01 ++ encoded DateTime64 value */;
BinaryStreamReader reader = new BinaryStreamReader(
new ByteArrayInputStream(payload), null, LZ4_FACTORY, null);
Object value = reader.readVariant(col);
// EXPECTED: caller can determine the active alternative was DateTime64 (index 1)
// ACTUAL: value is a java.time.ZonedDateTime, identical to what the DateTime
// alternative (index 0) would have produced — the active type is unrecoverable.
assertTrue(value instanceof ZonedDateTime); // passes for BOTH alternatives
The same happens through the high-level reader path (readValue → case Variant at line 248-250): reading a Variant(String, FixedString(10)) column yields a String with no indication of which alternative it was.
Suggested fix
Surface the active alternative index alongside the value rather than discarding it — mirroring the upstream proposal. Options:
- Return a small wrapper, e.g.
VariantValue { int typeIndex; Object value; }, from readVariant (breaking change to the public Object readVariant(...) signature — gate behind a minor/major bump), or
- Expose the resolved subtype (
column.getNestedColumns().get(ordNum)) so a renderer can format with the exact ClickHouseColumn.
Buggy code: BinaryStreamReader.readVariant at client-v2/src/main/java/com/clickhouse/client/api/data_formats/internal/BinaryStreamReader.java:892-898 (dispatched from readValue case Variant at line 248-250). The legacy clickhouse-data RowBinary processor should be checked for the same pattern.
Link
Relayed from ClickHouse/clickhouse-js#910.
Description
BinaryStreamReader.readVariant(ClickHouseColumn)reads the 1-byte Variant discriminant (ordNum) only to select which nested column's reader to run, then returns just the decoded value and throws the discriminant away:For many Variants the active alternative can be inferred from the returned object's runtime class, but for alternatives whose readers produce the same Java runtime type the information is lost, and the consumer cannot recover which ClickHouse subtype was actually on the wire. Examples where the decoded objects are indistinguishable:
Variant(DateTime, DateTime64(3))java.time.ZonedDateTime(lines 186-191 →convertDateTime,typeHint == null→ returnsZonedDateTime)Variant(String, FixedString(N))java.lang.StringVariant(Decimal32(s), Decimal64(s))java.math.BigDecimalBecause
readVariantreturns a bareObject, a downstream consumer (text/JSON renderer, type-aware formatter, or any caller inspecting the value) has no way to choose the correct subtype for formatting in these cases.This is the Java analogue of ClickHouse/clickhouse-js#910. Note the specific examples in that JS issue do not collide here —
clickhouse-javamapsEnum8/Enum16to a dedicatedEnumValue(lines 172-181) andDatetoLocalDate(line 183), soVariant(UInt8, Enum8(...))andVariant(Date, DateTime)are recoverable from the runtime type. The underlying limitation — the discriminant index is not surfaced — is the same, and other type pairs (above) hit it.ClickHouse server version
26.6.1.1193 (collision behavior determined by code analysis of the type→class mapping; not exercised end-to-end against the server).
Reproduction
A
Variant(DateTime, DateTime64(3))column: whichever alternative the server picks,readVariantreturns aZonedDateTime, so the caller cannot tellDateTimefromDateTime64.The same happens through the high-level reader path (
readValue→case Variantat line 248-250): reading aVariant(String, FixedString(10))column yields aStringwith no indication of which alternative it was.Suggested fix
Surface the active alternative index alongside the value rather than discarding it — mirroring the upstream proposal. Options:
VariantValue { int typeIndex; Object value; }, fromreadVariant(breaking change to the publicObject readVariant(...)signature — gate behind a minor/major bump), orcolumn.getNestedColumns().get(ordNum)) so a renderer can format with the exactClickHouseColumn.Buggy code:
BinaryStreamReader.readVariantatclient-v2/src/main/java/com/clickhouse/client/api/data_formats/internal/BinaryStreamReader.java:892-898(dispatched fromreadValuecase Variantat line 248-250). The legacyclickhouse-dataRowBinary processor should be checked for the same pattern.Link
Relayed from ClickHouse/clickhouse-js#910.