Skip to content

feat(connectors): Delta Lake Sink Connector#2889

Open
kriti-sc wants to merge 21 commits intoapache:masterfrom
kriti-sc:delta-sink
Open

feat(connectors): Delta Lake Sink Connector#2889
kriti-sc wants to merge 21 commits intoapache:masterfrom
kriti-sc:delta-sink

Conversation

@kriti-sc
Copy link
Copy Markdown
Contributor

@kriti-sc kriti-sc commented Mar 7, 2026

Which issue does this PR close?

Closes #1852

Rationale

Delta Lake is a data analytics engine, and very popular in modern streaming analytics architectures.

What changed?

Introduces a Delta Lake Sink Connector that enables writing data from Iggy to Delta Lake.

The Delta Lake writing logic is heavily inspired by the kafka-delta-ingest project, to have a proven starting ground for writing to Delta Lake.

Local Execution

  1. Produced 32632 messages with schema user_id: String, user_type: u8, email: String, source: String, state: String, created_at: DateTime<Utc>, message: String using sample data producer.
  2. Consumed messages using the Delta Lake sink and created a Delta table on filesystem.
  3. Verified number of rows in delta table and the schema.
  4. Added unit tests and e2e tests, both passing.
image Left: messages produced; Right(top): messages consumed by Delta sink; Right(bottom): Inspecting Delta table in python

AI Usage

If AI tools were used, please answer:

  1. Which tools? Claude Code
  2. Scope of usage? generated functions
  3. How did you verify the generated code works correctly? Manual testing by producing data into Iggy and then running the sink and verifying local Delta Lake creation, unit tests and e2e tests for local Delta Lake and Delta Lake on S3.
  4. Can you explain every line of the code if asked? Yes

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 7, 2026

Codecov Report

❌ Patch coverage is 74.89824% with 185 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.09%. Comparing base (7fca0f2) to head (de5df91).

Files with missing lines Patch % Lines
core/connectors/sinks/delta_sink/src/coercions.rs 75.68% 92 Missing and 6 partials ⚠️
core/connectors/sinks/delta_sink/src/storage.rs 74.35% 56 Missing and 4 partials ⚠️
core/connectors/sinks/delta_sink/src/sink.rs 15.38% 22 Missing ⚠️
core/connectors/sdk/src/convert.rs 92.53% 5 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #2889       +/-   ##
=============================================
- Coverage     70.58%   60.09%   -10.49%     
  Complexity      943      943               
=============================================
  Files          1113     1117        +4     
  Lines         94590    95327      +737     
  Branches      71790    72543      +753     
=============================================
- Hits          66763    57287     -9476     
- Misses        25353    35587    +10234     
+ Partials       2474     2453       -21     
Components Coverage Δ
Rust Core 56.70% <74.89%> (-13.93%) ⬇️
Java SDK 62.30% <ø> (ø)
C# SDK 69.10% <ø> (-0.29%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.44% <ø> (-0.10%) ⬇️
Go SDK 38.97% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sinks/delta_sink/src/lib.rs 100.00% <100.00%> (ø)
core/connectors/sdk/src/convert.rs 94.25% <92.53%> (-5.75%) ⬇️
core/connectors/sinks/delta_sink/src/sink.rs 15.38% <15.38%> (ø)
core/connectors/sinks/delta_sink/src/storage.rs 74.35% <74.35%> (ø)
core/connectors/sinks/delta_sink/src/coercions.rs 75.68% <75.68%> (ø)

... and 259 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@hubcio hubcio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall good direction, but needs a bit refining. thanks for contribution @kriti-sc

@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented Mar 26, 2026

@kriti-sc I will finish review after both #2925 and #2933 get merged and you rebase against these.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions bot added the stale Inactive issue or pull request label Apr 3, 2026
*value = Value::String(value.to_string());
}
}
CoercionNode::Coercion(Coercion::ToTimestamp) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. arrow's string_to_datetime (arrow-cast/parse.rs:195) accepts ' ' as a timestamp separator- so "2021-11-11 22:11:58" passes Arrow but fails chrono's DateTime::from_str. for these strings the coercion is a no-op and you get a spurious warn! log, but the write succeeds.
  2. for strings that BOTH parsers reject (e.g., "This definitely is not a timestamp"), Arrow fails at timestamp_array.rs:63-68 with ArrowError::JsonError, which fails record_batch_from_message and rejects the entire batch. these strings are left as-is by the coercion layer and then blow up downstream.

simd_json::StaticNode::Bool(b) => serde_json::Value::Bool(*b),
simd_json::StaticNode::I64(n) => serde_json::Value::Number((*n).into()),
simd_json::StaticNode::U64(n) => serde_json::Value::Number((*n).into()),
simd_json::StaticNode::F64(n) => serde_json::Number::from_f64(*n)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't drop the message - use the SDK's owned_value_to_serde_json from iggy_connector_sdk::convert (convert.rs:29). it maps non-finite floats to null instead of returning None. if the column is nullable, null is correct. if non-nullable, the Delta writer will reject it with a clear error,better than silently losing the entire message. IMHO its better this way

@github-actions github-actions bot removed the stale Inactive issue or pull request label Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Delta Lake connectors

4 participants