Skip to content

[DON'T MERGE] Relytcloud customizations on DuckLake v1.5#2

Open
qsliu2017 wants to merge 31 commits into
upstream/v1.5-variegatafrom
main
Open

[DON'T MERGE] Relytcloud customizations on DuckLake v1.5#2
qsliu2017 wants to merge 31 commits into
upstream/v1.5-variegatafrom
main

Conversation

@qsliu2017
Copy link
Copy Markdown
Collaborator

@qsliu2017 qsliu2017 commented Mar 12, 2026

Summary

Custom changes on top of upstream DuckLake v1.5 (upstream/v1.5-variegata).
See also: duckdb#731

Extensible metadata manager

Refactored DuckLakeMetadataManager so all metadata queries go through virtual methods, enabling pluggable backends (e.g. PostgresMetadataManager):

  • Execute() / Query(): Virtual methods for DDL/DML vs SELECT statements. Signature changed from string & to string (by value). Added no-snapshot overloads.
  • ExecuteCommit(): Hook for custom commit handling (retry/conflict resolution).
  • IsInitialized(): Encapsulates ATTACH and metadata table detection (moved from DuckLakeInitializer).
  • FillSnapshotArgs() / FillSnapshotCommitArgs() / FillCatalogArgs(): Static helpers for query placeholder substitution (moved from DuckLakeTransaction).
  • InlinedDeletionTableExists(): Virtual method for backend-specific table existence checks.
  • GetActiveFiles(): Extracted helper for orphan file cleanup.
  • ListAggregation() / CastStatsToTarget() / CastColumnToTarget(): Made virtual for DBMS-specific SQL syntax.
  • LoadTags() / LoadInlinedDataTables(): Made virtual for DBMS-specific JSON parsing.
  • TransformInlinedData(): Separated from read for cross-DB type conversion.
  • DuckLakeMetadataManager::Register(): Pluggable metadata manager registration.

PostgresMetadataManager

PostgreSQL-specific overrides:

  • Execute() / Query() route to postgres_execute / postgres_query stored procedures
  • InlinedDeletionTableExists() uses information_schema.tables instead of direct table query (avoids aborting PG transactions)
  • ListAggregation() uses json_agg(json_build_object(...)) instead of DuckDB LIST()
  • CastStatsToTarget() / CastColumnToTarget() use PG cast syntax (no TRY_CAST)
  • LoadTags() / LoadInlinedDataTables() parse JSON via yyjson (PG returns JSON strings, not DuckDB Values)
  • TransformInlinedData() casts VARCHAR columns back to expected DuckDB types

SQL compatibility fixes

  • NULL columns in UNION queries explicitly typed (NULL::VARCHAR AS path, NULL::BOOLEAN AS path_is_relative, etc.)
  • UUID generation changed from UUID() to explicit 'uuid'::UUID with GenerateUUID()

New config option

  • ducklake_default_table_path (SESSION): default directory path for new DuckLake tables

Other changes

  • Connection null-safety checks in FlushChanges() (Commit/Rollback/BeginTransaction)
  • GetFilesDeletedOrDroppedAfterSnapshot() changed from const to non-const

qsliu2017 and others added 19 commits January 14, 2026 10:27
This test ensures excessive files aren't read with more complex query plans generated by limit/offset.

Co-authored-by: Tom Jakubowski <tom@crystae.net>
Fixup error messages for migration
Add ExecuteCommit() virtual method to DuckLakeMetadataManager so that
implementations (e.g. pg_ducklake) can override commit-path metadata
writes separately from regular Execute() calls.  The default
implementation delegates to Execute().

FlushChanges() now calls ExecuteCommit() instead of Execute() for the
batch commit write, and null-guards the connection pointer before
Rollback()/BeginTransaction() in the retry catch block to support
backends where the metadata connection is not a DuckDB connection
(e.g. pg_ducklake uses PostgreSQL SPI).
Merge 652 commits from duckdb/ducklake upstream/main.

Resolved conflicts preserving relytcloud customizations:
- Keep Execute()/Query() virtual methods on DuckLakeMetadataManager
  so custom metadata managers (e.g. PostgresMetadataManager) can
  intercept all metadata queries
- Keep ExecuteCommit() hook for custom commit handling
- Keep ducklake_default_table_path config option
- Accept upstream's ListAggregation rename (from WrapWithListAggregation)
- Accept all new upstream features: macros, data inlining improvements,
  deletion inlining, sorted tables, variant stats, geo stats, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
69 additional commits on top of upstream/main merge.
Only 1 code conflict (migration error message) - kept our dynamic version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qsliu2017 qsliu2017 changed the title Relytcloud customizations on DuckLake v1.5 [DON'T MERGE] Relytcloud customizations on DuckLake v1.5 Mar 12, 2026
qsliu2017 and others added 2 commits March 12, 2026 22:19
…31 customizations

- Reset non-custom files to upstream v1.5-variegata (multi_file_list, test files, json config)
- Remove spurious formatting diffs in ducklake_schema_entry.cpp
- Revert upstream/main-only ExecuteMigration signature change
- Restore FillSnapshotArgs/FillSnapshotCommitArgs/FillCatalogArgs static helpers
- Restore IsInitialized virtual method
- Route flush_inlined_data queries through metadata_manager
- Simplify transaction.Query using FillCatalogArgs, remove Query(snapshot) overload
- Add connection null-guards in FlushChanges retry logic
- Use initial_schema_uuid in InitializeDuckLake for Postgres compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add explicit type casts to NULL literals in GetTableDeletions UNION
  branches (NULL::BOOLEAN, NULL::BIGINT, NULL::VARCHAR) so PostgreSQL
  can match column types across UNION ALL
- Override GetInlinedDeletionTableName in PostgresMetadataManager to
  use information_schema.tables instead of SELECT NULL FROM table,
  avoiding transaction abort on missing tables
- Move delete_inlined_table_cache to protected for subclass access
- Apply PR duckdb#731 initializer fix: delegate to
  metadata_manager.IsInitialized() instead of inline ATTACH + count

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/storage/ducklake_metadata_manager.cpp Outdated
Comment thread src/storage/ducklake_metadata_manager.cpp Outdated
Comment thread src/storage/ducklake_metadata_manager.cpp Outdated
Comment thread src/include/storage/ducklake_metadata_manager.hpp
Comment thread src/metadata_manager/postgres_metadata_manager.cpp Outdated
qsliu2017 and others added 2 commits March 13, 2026 18:02
Execute() is for DDL/DML (INSERT, UPDATE, DELETE, CREATE, ALTER).
Query() is for SELECT statements that read result rows.
This distinction matters in subclass overrides like PostgresMetadataManager
which routes them to different stored procedures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…licate ExecuteQuery

- Replace full GetInlinedDeletionTableName override in PostgresMetadataManager
  with a narrow InlinedDeletionTableExists override, keeping cache management
  and table creation logic in the base class
- Move delete_inlined_table_cache back to private (no longer exposed)
- Deduplicate snapshot/commit arg replacement in ExecuteQuery by calling
  FillSnapshotArgs and FillSnapshotCommitArgs static helpers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
qsliu2017 and others added 2 commits March 16, 2026 15:41
Two bugs caused inlined file deletion metadata to silently disappear
when a concurrent commit forced a retry in FlushChanges:

1. GetNewInlinedFileDeletes() used std::move on the source map, emptying
   it on the first attempt. On retry the data was gone.
2. The delete_inlined_table_cache was not cleared after rollback, so the
   retry skipped CREATE TABLE IF NOT EXISTS for the inlined deletion
   table — the subsequent INSERT failed on a non-existent table.

Fix: copy instead of move in GetNewInlinedFileDeletes, and clear the
inlined table caches before each retry. Also removes the unused
inlined_table_name_cache field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chore: merge upstream v1.5-variegata for DuckDB v1.5.1
@qsliu2017 qsliu2017 closed this Mar 27, 2026
@qsliu2017 qsliu2017 reopened this Mar 27, 2026
@qsliu2017
Copy link
Copy Markdown
Collaborator Author

Replaced by a new PR tracking the pg_ducklake branch (auto-synced from pg_ducklake subtree).

@qsliu2017 qsliu2017 closed this Mar 27, 2026
@qsliu2017 qsliu2017 reopened this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants