Skip to content

[jdbc-v2] Backtick-quoted (Nested) INSERT column names are not unescaped before schema lookup in WriterStatementImpl #2896

Description

@claude

Description

When the beta RowBinary writer is enabled (com.clickhouse.jdbc.DriverProperties.BETA_ROW_BINARY_WRITERWriterStatementImpl), the JDBC driver maps the INSERT column list parsed from the SQL to the server table schema by name. The parsed column names are stored as the raw parse-tree text, including any backticks, but are never unescaped before the schema lookup. As a result, an INSERT that quotes its column names with backticks — in particular the canonical wire form of a Nested sub-column, `directory`.`id` — fails to match the clean schema column names (directory.id) and throws NoSuchColumnException.

This is the Java sibling of the defect filed against the Python client in ClickHouse/clickhouse-connect#820 (unescape_identifier only peeling the outer backtick pair). In clickhouse-connect the fast path silently degrades; in jdbc-v2 the analogous name-matching step throws.

Mechanism (traced in source)

  1. SqlParserFacade (ANTLR4 parser), enterInsertStmt:

    List<ClickHouseParser.NestedIdentifierContext> names = columns.nestedIdentifier();
    String[] insertColumns = new String[names.size()];
    for (int i = 0; i < names.size(); i++) {
        insertColumns[i] = names.get(i).getText();   // <-- raw text, backticks preserved
    }
    parsedStatement.setInsertColumns(insertColumns);

    Grammar: nestedIdentifier : identifier (DOT identifier)?, and the backquoted-identifier lexer token includes the backtick characters, so getText() for `directory`.`id` returns the literal string `directory`.`id` (not directory.id).

    Note the asymmetry in the same method: the table and database identifiers are unescaped —

    parsedStatement.setTable(ClickHouseSqlUtils.unescape(tableId.identifier().getText()));
    ...
    .map(id -> ClickHouseSqlUtils.unescape(id.getText()))

    — but the insert column names are not.

  2. WriterStatementImpl constructor:

    for (String column : parsedStatement.getInsertColumns()) {
        insertColumns.add(tableSchema.getColumnByName(column));   // column == "`directory`.`id`"
    }
  3. TableSchema.getColumnByNamenameToIndexcolIndex.get(name), where colIndex is keyed on the clean server column names (directory.id). The lookup misses and NoSuchColumnException is thrown. (Even a simple single backtick-quoted column such as `id` exhibits the same miss, since nothing is unescaped.)

The unquoted forms work fine: INSERT INTO t (directory.id, ...) yields getText() == directory.id, which matches the schema.

ClickHouse server version

Tested server reachable at http://localhost:8123 reports 26.6.1.1193. Code analysis only; the repro below was not executed against the running server. This is a pure client-side identifier-parsing defect and is server-version independent.

Reproduction

TestNG integration test (mirrors WriterStatementImplTest), using the beta RowBinary writer path and backtick-quoted Nested sub-column names:

@Test(groups = {"integration"})
public void testBacktickQuotedNestedInsertColumns() throws SQLException {
    Properties properties = new Properties();
    properties.setProperty(DriverProperties.BETA_ROW_BINARY_WRITER.getKey(), "true");

    try (Connection connection = getJdbcConnection(properties);
         Statement st = connection.createStatement()) {
        st.execute("DROP TABLE IF EXISTS bug_nested");
        st.execute("CREATE TABLE bug_nested " +
                "(directory Nested(id UInt32, type String, path String)) " +
                "ENGINE = MergeTree ORDER BY tuple()");

        // Backtick-quoted dotted (Nested) column names — e.g. what an ORM emits.
        String sql = "INSERT INTO bug_nested " +
                "(`directory`.`id`, `directory`.`type`, `directory`.`path`) VALUES (?, ?, ?)";

        // Expected: prepareStatement returns a WriterStatementImpl whose column list
        //           resolves to directory.id / directory.type / directory.path.
        // Actual:   NoSuchColumnException ("Result has no column with name '`directory`.`id`'")
        //           is thrown while building the WriterStatementImpl, because the parsed
        //           column names keep their backticks and never match the schema.
        try (PreparedStatement ps = connection.prepareStatement(sql)) {
            ps.setObject(1, new long[]{13L, 79L});
            ps.setObject(2, new String[]{"a", "b"});
            ps.setObject(3, new String[]{"/x", "/y"});
            ps.executeUpdate();
        }
    }
}

Expected: the backtick-quoted column names are unescaped to directory.id, directory.type, directory.path and matched against the table schema; the row is written.

Actual: NoSuchColumnException (surfaced as a SQLException) for column `directory`.`id` while constructing WriterStatementImpl.

Suggested fix

In jdbc-v2/.../internal/SqlParserFacade.java, enterInsertStmt, unescape each component of the parsed insert column name before storing it — symmetric with the existing table/database handling. Because nestedIdentifier is identifier (DOT identifier)?, unescape each identifier child and rejoin with . (e.g. via ClickHouseSqlUtils.unescape(...) per part) so `directory`.`id` becomes directory.id, rather than calling unescape on the whole getText() (which would only strip the outermost pair — the exact clickhouse-connect mistake). Alternatively, normalize names inside TableSchema.getColumnByName / the colIndex lookup.

Link

Python sibling: ClickHouse/clickhouse-connect#820

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions