[DON'T MERGE] Relytcloud customizations on DuckLake v1.5#2
Open
qsliu2017 wants to merge 31 commits into
Open
Conversation
…ytcloud-metadata-mgr
This test ensures excessive files aren't read with more complex query plans generated by limit/offset. Co-authored-by: Tom Jakubowski <tom@crystae.net>
Update DuckDB - Main
Fixup error messages for migration
Add ExecuteCommit() virtual method to DuckLakeMetadataManager so that implementations (e.g. pg_ducklake) can override commit-path metadata writes separately from regular Execute() calls. The default implementation delegates to Execute(). FlushChanges() now calls ExecuteCommit() instead of Execute() for the batch commit write, and null-guards the connection pointer before Rollback()/BeginTransaction() in the retry catch block to support backends where the metadata connection is not a DuckDB connection (e.g. pg_ducklake uses PostgreSQL SPI).
…ushdown Fix filter pushdown
Merge 652 commits from duckdb/ducklake upstream/main. Resolved conflicts preserving relytcloud customizations: - Keep Execute()/Query() virtual methods on DuckLakeMetadataManager so custom metadata managers (e.g. PostgresMetadataManager) can intercept all metadata queries - Keep ExecuteCommit() hook for custom commit handling - Keep ducklake_default_table_path config option - Accept upstream's ListAggregation rename (from WrapWithListAggregation) - Accept all new upstream features: macros, data inlining improvements, deletion inlining, sorted tables, variant stats, geo stats, etc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
69 additional commits on top of upstream/main merge. Only 1 code conflict (migration error message) - kept our dynamic version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…31 customizations - Reset non-custom files to upstream v1.5-variegata (multi_file_list, test files, json config) - Remove spurious formatting diffs in ducklake_schema_entry.cpp - Revert upstream/main-only ExecuteMigration signature change - Restore FillSnapshotArgs/FillSnapshotCommitArgs/FillCatalogArgs static helpers - Restore IsInitialized virtual method - Route flush_inlined_data queries through metadata_manager - Simplify transaction.Query using FillCatalogArgs, remove Query(snapshot) overload - Add connection null-guards in FlushChanges retry logic - Use initial_schema_uuid in InitializeDuckLake for Postgres compatibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add explicit type casts to NULL literals in GetTableDeletions UNION branches (NULL::BOOLEAN, NULL::BIGINT, NULL::VARCHAR) so PostgreSQL can match column types across UNION ALL - Override GetInlinedDeletionTableName in PostgresMetadataManager to use information_schema.tables instead of SELECT NULL FROM table, avoiding transaction abort on missing tables - Move delete_inlined_table_cache to protected for subclass access - Apply PR duckdb#731 initializer fix: delegate to metadata_manager.IsInitialized() instead of inline ATTACH + count Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
qsliu2017
commented
Mar 13, 2026
Execute() is for DDL/DML (INSERT, UPDATE, DELETE, CREATE, ALTER). Query() is for SELECT statements that read result rows. This distinction matters in subclass overrides like PostgresMetadataManager which routes them to different stored procedures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…licate ExecuteQuery - Replace full GetInlinedDeletionTableName override in PostgresMetadataManager with a narrow InlinedDeletionTableExists override, keeping cache management and table creation logic in the base class - Move delete_inlined_table_cache back to private (no longer exposed) - Deduplicate snapshot/commit arg replacement in ExecuteQuery by calling FillSnapshotArgs and FillSnapshotCommitArgs static helpers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs caused inlined file deletion metadata to silently disappear when a concurrent commit forced a retry in FlushChanges: 1. GetNewInlinedFileDeletes() used std::move on the source map, emptying it on the first attempt. On retry the data was gone. 2. The delete_inlined_table_cache was not cleared after rollback, so the retry skipped CREATE TABLE IF NOT EXISTS for the inlined deletion table — the subsequent INSERT failed on a non-existent table. Fix: copy instead of move in GetNewInlinedFileDeletes, and clear the inlined table caches before each retry. Also removes the unused inlined_table_name_cache field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chore: merge upstream v1.5-variegata for DuckDB v1.5.1
Collaborator
Author
|
Replaced by a new PR tracking the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Custom changes on top of upstream DuckLake v1.5 (
upstream/v1.5-variegata).See also: duckdb#731
Extensible metadata manager
Refactored
DuckLakeMetadataManagerso all metadata queries go through virtual methods, enabling pluggable backends (e.g.PostgresMetadataManager):Execute()/Query(): Virtual methods for DDL/DML vs SELECT statements. Signature changed fromstring &tostring(by value). Added no-snapshot overloads.ExecuteCommit(): Hook for custom commit handling (retry/conflict resolution).IsInitialized(): Encapsulates ATTACH and metadata table detection (moved fromDuckLakeInitializer).FillSnapshotArgs()/FillSnapshotCommitArgs()/FillCatalogArgs(): Static helpers for query placeholder substitution (moved fromDuckLakeTransaction).InlinedDeletionTableExists(): Virtual method for backend-specific table existence checks.GetActiveFiles(): Extracted helper for orphan file cleanup.ListAggregation()/CastStatsToTarget()/CastColumnToTarget(): Made virtual for DBMS-specific SQL syntax.LoadTags()/LoadInlinedDataTables(): Made virtual for DBMS-specific JSON parsing.TransformInlinedData(): Separated from read for cross-DB type conversion.DuckLakeMetadataManager::Register(): Pluggable metadata manager registration.PostgresMetadataManager
PostgreSQL-specific overrides:
Execute()/Query()route topostgres_execute/postgres_querystored proceduresInlinedDeletionTableExists()usesinformation_schema.tablesinstead of direct table query (avoids aborting PG transactions)ListAggregation()usesjson_agg(json_build_object(...))instead of DuckDBLIST()CastStatsToTarget()/CastColumnToTarget()use PG cast syntax (noTRY_CAST)LoadTags()/LoadInlinedDataTables()parse JSON via yyjson (PG returns JSON strings, not DuckDB Values)TransformInlinedData()casts VARCHAR columns back to expected DuckDB typesSQL compatibility fixes
NULL::VARCHAR AS path,NULL::BOOLEAN AS path_is_relative, etc.)UUID()to explicit'uuid'::UUIDwithGenerateUUID()New config option
ducklake_default_table_path(SESSION): default directory path for new DuckLake tablesOther changes
FlushChanges()(Commit/Rollback/BeginTransaction)GetFilesDeletedOrDroppedAfterSnapshot()changed fromconstto non-const