feat(connectors): Delta Lake Sink Connector#2889
feat(connectors): Delta Lake Sink Connector#2889kriti-sc wants to merge 21 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2889 +/- ##
=============================================
- Coverage 70.58% 60.09% -10.49%
Complexity 943 943
=============================================
Files 1113 1117 +4
Lines 94590 95327 +737
Branches 71790 72543 +753
=============================================
- Hits 66763 57287 -9476
- Misses 25353 35587 +10234
+ Partials 2474 2453 -21
🚀 New features to boost your workflow:
|
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR. Thank you for your contribution! |
| *value = Value::String(value.to_string()); | ||
| } | ||
| } | ||
| CoercionNode::Coercion(Coercion::ToTimestamp) => { |
There was a problem hiding this comment.
- arrow's string_to_datetime (arrow-cast/parse.rs:195) accepts ' ' as a timestamp separator- so "2021-11-11 22:11:58" passes Arrow but fails chrono's DateTime::from_str. for these strings the coercion is a no-op and you get a spurious warn! log, but the write succeeds.
- for strings that BOTH parsers reject (e.g., "This definitely is not a timestamp"), Arrow fails at timestamp_array.rs:63-68 with ArrowError::JsonError, which fails record_batch_from_message and rejects the entire batch. these strings are left as-is by the coercion layer and then blow up downstream.
| simd_json::StaticNode::Bool(b) => serde_json::Value::Bool(*b), | ||
| simd_json::StaticNode::I64(n) => serde_json::Value::Number((*n).into()), | ||
| simd_json::StaticNode::U64(n) => serde_json::Value::Number((*n).into()), | ||
| simd_json::StaticNode::F64(n) => serde_json::Number::from_f64(*n) |
There was a problem hiding this comment.
don't drop the message - use the SDK's owned_value_to_serde_json from iggy_connector_sdk::convert (convert.rs:29). it maps non-finite floats to null instead of returning None. if the column is nullable, null is correct. if non-nullable, the Delta writer will reject it with a clear error,better than silently losing the entire message. IMHO its better this way
Which issue does this PR close?
Closes #1852
Rationale
Delta Lake is a data analytics engine, and very popular in modern streaming analytics architectures.
What changed?
Introduces a Delta Lake Sink Connector that enables writing data from Iggy to Delta Lake.
The Delta Lake writing logic is heavily inspired by the kafka-delta-ingest project, to have a proven starting ground for writing to Delta Lake.
Local Execution
user_id: String, user_type: u8, email: String, source: String, state: String, created_at: DateTime<Utc>, message: Stringusing sample data producer.AI Usage
If AI tools were used, please answer: