[draft](catalog) Master catalog spi 07 paimon#64445
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 29675 ms |
TPC-DS: Total hot run time: 168026 ms |
FE UT Coverage ReportIncrement line coverage |
af2037c to
f7114a2
Compare
|
run buildall |
TPC-H: Total hot run time: 29192 ms |
TPC-DS: Total hot run time: 167905 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 28510 ms |
TPC-DS: Total hot run time: 168080 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 28272 ms |
TPC-DS: Total hot run time: 168520 ms |
FE UT Coverage ReportIncrement line coverage |
1 similar comment
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
…ssloader, not TCCL CI external 973270 (commit 13d3876): all 37 paimon-family tests failed at CREATE CATALOG with 'No MetaStoreProvider supports the given properties; registered providers: []'. Root cause: MetaStoreProviders.PROVIDERS is a static field loaded once via the 1-arg ServiceLoader.load(MetaStoreProvider.class), which resolves against the thread-context classloader. Its first touch is PaimonConnectorProvider.validateProperties at CREATE CATALOG time, running on an FE worker thread whose TCCL is the FE app loader. fe-core does not depend on fe-connector-metastore-spi, so the 5 providers and their META-INF/services file live only inside the connector plugin's child classloader; the lookup finds nothing and caches an empty list process-wide, so every paimon CREATE CATALOG fails. (The sibling PaimonConnector.createCatalogFromContext already pins the TCCL for the same reason; the validation path did not.) Fix: load via the 2-arg ServiceLoader.load(MetaStoreProvider.class, MetaStoreProvider.class.getClassLoader()) — the interface's defining loader is the plugin loader that has the service file and impls, independent of the caller's TCCL. Tests: fe-connector-metastore-spi 44/0/0, checkstyle 0. The classloader behavior is single-classpath-invisible (1-arg and 2-arg both pass in surefire); real gate is the P2-T05 docker plugin-layout regression (enablePaimonTest=true). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aining = paimon pom + gate Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 28860 ms |
TPC-DS: Total hot run time: 176568 ms |
ClickBench: Total hot run time: 25.29 s |
…n loader - root cause: HiveMetaStoreClient.loadFilterHooks -> Configuration.getClass resolves metastore.filter.hook via the HiveConf's own classLoader field. assembleHiveConf's `new HiveConf()` captured the TCCL at construction (parent 'app' loader, before the plugin TCCL pin in createCatalogFromContext); under child-first plugin loading that resolved DefaultMetaStoreFilterHookImpl from the parent while MetaStoreFilterHook was child-loaded -> "class DefaultMetaStoreFilterHookImpl not MetaStoreFilterHook" -> paimon-over-HMS `create database` failed (test_create_paimon_table:44). - solution: assembleHiveConf now calls hiveConf.setClassLoader(plugin loader), mirroring buildHadoopConfiguration:257. Single chokepoint -> covers both the hms and dlf flavors. Connector-local; fe-core stays connector-agnostic. - tests: PaimonCatalogFactoryTest.assembleHiveConfPinsPluginClassLoaderNotTccl (installs a foreign TCCL, asserts the conf is pinned to the plugin loader; RED before / GREEN after). Full class 16/16 pass; module checkstyle clean. Real gate = docker enablePaimonTest=true. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011mTrPcvMZtFjsxWJM5TRnG
…ropped-catalog race) - root cause: a concurrent CATALOG DROP leaves a PluginDrivenExternalCatalog with connector=null but objectCreated=true (onClose() nulls the transient connector but does not reset objectCreated, and dropCatalog calls onClose() directly, not resetToUninitialized). A stale mv_infos()/jobs() metadata scan iterates all MTMVs and reaches such a dropped catalog's related table; materializeLatest() dereferenced the null connector -> NPE that aborted the whole metadata query, failing the unrelated test_mysql_mtmv (getJobName). Legacy onClose never nulled the field. - solution: materializeLatest() returns a valid empty pin (snapshot id -1, empty partition maps) when getConnector() is null, mirroring the existing dropped-table (no-handle) branch. Connector-agnostic; getConnector()'s contract unchanged; cannot mask a real init failure (initLocalObjectsImpl throws when it cannot build a connector on a healthy catalog). - tests: PluginDrivenMvccExternalTableTest.testMaterializeLatestNullConnectorDegradesToEmptyPin (null-connector catalog -> loadSnapshot returns the empty pin; the RED run threw the exact production NPE). Full class 36/36; fe-core checkstyle clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011mTrPcvMZtFjsxWJM5TRnG
…n (no p_NULL dup) - root cause: on the paimon null_partition table (genuine NULL + string 'null' + string 'NULL' + 'bj'), the branch's IS-NULL-prune fix marks a genuine-null partition isNull=true, so PartitionInfo.toPartitionValue renders it as the bare keyword "NULL" -> MTMV partition name "p_NULL", colliding with the literal string 'NULL' partition (also p_NULL) -> "Duplicated named partition: p_NULL" on CREATE MATERIALIZED VIEW partition by(region). Master kept isNull=false so genuine-null got a distinct name. - solution: ListPartitionItem.toPartitionKeyDesc (both overloads) now substitutes the key's originHiveKeys sentinel (e.g. __HIVE_DEFAULT_PARTITION__) as the DISPLAY value for a genuine-null partition while keeping isNull=true. getValue() stays a NullLiteral so IS NULL pruning is unaffected; only the rendered name changes -> genuine-null becomes p_HIVEDEFAULTPARTITION (distinct). Connector-agnostic; MTMV-only blast radius (OLAP partitions have empty originHiveKeys -> no-op). Also closes the same latent collision for hive. - tests: new ListPartitionItemTest (genuine-null vs string-'NULL' produce distinct names and the null key still resolves to a NullLiteral; OLAP null partition unaffected). The RED run reproduced "expected: not equal but was: <p_NULL>". 2/2 GREEN; MTMVPartitionUtilTest 10/10 (no regression); fe-core checkstyle clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011mTrPcvMZtFjsxWJM5TRnG
…schema TTL) The SPI migration split the legacy single meta.cache.paimon.table.ttl-second knob (which covered BOTH the data snapshot AND the schema) and dropped the data cache, so test_paimon_table_meta_cache failed on two assertions: the with-cache catalog saw an external INSERT immediately (data), and the no-cache catalog served stale schema. Axis A (data snapshot): - new PaimonLatestSnapshotCache on the per-catalog PaimonConnector: TTL cache of the latest snapshot id keyed by Identifier(db,table), sized by meta.cache.paimon.table.ttl-second (legacy default 86400; <=0 disables -> always live = the no-cache catalog), access-based expiry. Injected into PaimonConnectorMetadata (new 5-arg ctor; the 3/4-arg ctors get a disabled cache so existing direct-construction tests are unchanged). - beginQuerySnapshot serves the id through the cache (live read only on a miss). The pinned id reaches the scan via applySnapshot -> scan.snapshot-id -> resolveScanTable Table.copy, so an external write after the pin is invisible until expiry/refresh. - new Connector.invalidateTable/invalidateAll SPI default no-ops; PaimonConnector overrides them; RefreshManager.refreshTableInternal invalidates any PluginDrivenExternalCatalog's connector (keyed by REMOTE names). REFRESH CATALOG already rebuilds the connector. Axis B (schema TTL): - new Connector.schemaCacheTtlSecondOverride() SPI default empty; PaimonConnector returns meta.cache.paimon.table.ttl-second when set. - new generic ExternalCatalog.overlayMetaCacheConfig hook (no-op); PluginDrivenExternalCatalog overrides it to set schema.cache.ttl-second from the connector override (user value wins); ExternalMetaCacheMgr.findCatalogProperties applies it to its EPHEMERAL property copy (no SHOW CREATE leak). REFRESH TABLE already invalidates the schema cache entry. So the no-cache catalog (ttl-second=0) serves fresh schema. fe-core stays connector-agnostic (virtual dispatch; base no-ops). ttl-second removed from the PaimonConnectorProvider "dead keys" warning (it again takes effect); enable/capacity remain not-wired (still reported ignored). tests: PaimonLatestSnapshotCacheTest 5/5 (cache within TTL / ttl=0 bypass / invalidate / expiry via injected clock), PaimonConnectorCacheTest 4/4 (schemaCacheTtlSecondOverride mapping); regression PaimonConnectorMetadataMvccTest 40/40, ValidateProperties 14/14, fe-core compiles + PluginDrivenMvccExternalTableTest 36/36 + ListPartitionItemTest 2/2; checkstyle clean across the 3 touched modules. Cross-query data cache + schema TTL + refresh are gated by the docker e2e (enablePaimonTest=true rerun of test_paimon_table_meta_cache). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011mTrPcvMZtFjsxWJM5TRnG
|
run buildall |
TPC-H: Total hot run time: 28837 ms |
TPC-DS: Total hot run time: 175071 ms |
ClickBench: Total hot run time: 25.21 s |
FE Regression Coverage ReportIncrement line coverage |
…(revert FIX-3) Verification run for 973411's FIX-1..4. FIX-1/FIX-2 held; FIX-3 was itself a regression and FIX-4 did not fix its target. Two root causes, four failing suites (all test .out byte-identical to master -> master is the spec). Root causes: - FIX-3 (26a8ecd) patched only the genuine-null partition DISPLAY name on the SHARED ListPartitionItem.toPartitionKeyDesc. It never touched the MTMV refresh predicate (so test_paimon_mtmv still failed, now 5 rows vs master's 3), and it renamed hive's default partition p_NULL -> p_HIVEDEFAULTPARTITION, breaking test_hive_default_mtmv (asserts p_NULL) and test_upgrade_downgrade_mtmv (the MTMV.calculatePartitionMappings join needs both sides to render symmetrically). - The paimon bridge marked genuine-null isNull=true (1b0ae1e), coupling two opposite needs through one flag: WHERE IS NULL prune wants true, MTMV refresh wants false (master PaimonUtil uses isNull=false unconditionally and loses the null data; IS NULL still works via the predicate-driven scan). - test_paimon_table_meta_cache: the SPI routes the latest schema through the generic name-keyed schema cache (no schemaId) whose TTL spec is frozen at first build (AbstractExternalMetaCache.initCatalog computeIfAbsent), so ttl-second=0 cannot bust it after an external ALTER -> stale schema. Fix (master parity; unit-tested, checkstyle clean; docker e2e still gated): - A1 revert FIX-3: ListPartitionItem.java restored byte-identical to master; ListPartitionItemTest now asserts the master p_NULL rendering. - A2 paimon bridge PluginDrivenMvccExternalTable.toListPartitionItem -> new PartitionValue(value, false), matching master PaimonUtil. The genuine-null MTMV partition is a StringLiteral sentinel (no p_NULL collision) and its refresh IN-predicate drops the null rows (3 rows, master parity). - A3 new connector capability ConnectorScanPlanProvider.ignorePartitionPruneShort Circuit() (default false; PaimonScanPlanProvider -> true), consulted by PluginDrivenScanNode.resolveRequiredPartitions: a predicate-driven connector maps a prune-to-zero to scan-all instead of the empty short-circuit. Required so WHERE col IS NULL still returns the genuine-null row under isNull=false (qt_null_partition_4) -- the branch short-circuits empty pruned sets where master's PaimonScanNode re-plans from the pushed predicate. - B PluginDrivenMvccExternalTable.getLatestSchemaCacheValue: when the connector disables its schema cache (schemaCacheTtlSecondOverride <= 0, the no-cache catalog) read the schema FRESH via initSchema(), bypassing the frozen name-keyed cache; the cached catalog keeps the cached path. Restores master's single-knob meta.cache.paimon.table.ttl-second=0 -> always-fresh-schema. Docker gate (enablePaimonTest/enableHiveTest=true): test_paimon_mtmv->3 rows; qt_null_partition_4->`1 \N 100.0` (must stay green); test_hive_default_mtmv-> p_NULL; test_upgrade_downgrade_mtmv->sync true; test_paimon_table_meta_cache-> no-cache desc 3 cols. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011mTrPcvMZtFjsxWJM5TRnG
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
…r (not cached rowType) The last failing test of 973480 (test_paimon_table_meta_cache:112): a no-cache catalog (meta.cache.paimon.table.ttl-second=0) returns 2 columns instead of 3 after an external ALTER TABLE ADD COLUMNS. 16b6255's "Fix B" bypassed the FE schema cache (initSchema) but the bypass still read a stale schema, so the test kept failing. Root cause (master test .out is the spec): the connector's LATEST schema path read table.rowType(). paimon's CatalogFactory.createCatalog wraps a CachingCatalog (cache.enabled defaults true), so catalog.getTable() returns a cached Table whose rowType() is FROZEN at load time; only latestSnapshot()/schemaManager() read files fresh. That is why the no-cache DATA path was fresh (snapshot re-read) while the SCHEMA path was stale. An ALTER ADD COLUMNS bumps the schema file (new schema id) WITHOUT a new snapshot, so the latest snapshot's schemaId also stays behind — only schemaManager().latest() advances. Legacy PaimonExternalTable read the latest schema via schemaManager().latest() (never rowType()); the SPI cutover regressed to rowType(). Fix (connector-local, master parity): PaimonConnectorMetadata.getTableSchema (latest path) reads the latest schema FRESH via a new PaimonCatalogOps.latestSchema seam (((DataTable) table).schemaManager().latest()) for a non-system data table; partition and primary keys come from that resolved schema too. Dual guard: system tables (isSystemTable()) keep their synthetic rowType() — some are not DataTable ($snapshots/$manifests) and the DataTable-backed ones ($ro/$audit_log/$binlog) need the synthetic schema; a non-DataTable backend (FormatTable) / schema-less table -> latestSchema empty -> fall back to rowType(). This also fixes the with-cache REFRESH assertion (line 117): REFRESH busts the FE schema cache but not the paimon CachingCatalog, so only schemaManager().latest() reflects the external change. TDD: RED getTableSchemaReadsLatestSchemaNotCachedRowType (expected 3 but was 2, same symptom as CI) -> GREEN; plus sys-table guard + empty-latest fallback tests. Paimon module 318/0/0 (1 live-connectivity skip), checkstyle 0. e2e docker-gated (enablePaimonTest=true) NOT run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_011mTrPcvMZtFjsxWJM5TRnG
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29211 ms |
TPC-DS: Total hot run time: 175280 ms |
ClickBench: Total hot run time: 25.35 s |
TPC-H: Total hot run time: 28997 ms |
TPC-DS: Total hot run time: 175568 ms |
ClickBench: Total hot run time: 25.08 s |
only for testing