[SPARK-55347][SDP][FOLLOW-UP] Cleanup AutoCDC Flow code#56255
Open
szehon-ho wants to merge 3 commits into
Open
[SPARK-55347][SDP][FOLLOW-UP] Cleanup AutoCDC Flow code#56255szehon-ho wants to merge 3 commits into
szehon-ho wants to merge 3 commits into
Conversation
Follow-up to apache#56160 (SPARK-57113) addressing post-merge review comments and reducing duplication in the AutoCDC flow code and its test suites. No behavior change. - FlowExecution.scala: hoist org.json4s imports to the file top; factor the duplicated AutoCDC key-field resolution into a shared expectedAuxiliaryKeyFields helper; import scala.collection.mutable instead of a fully-qualified inline reference. - Tests: add a shared singleAutoCdcFlowPipeline helper to AutoCdcGraphExecutionTestMixin and use it across the AutoCDC E2E suites, removing repeated single-table/single-flow registration boilerplate.
…cd1KeyDriftSuite Replace the repeated per-test `val session = spark; import session.implicits._` with a single class-level `import testImplicits._`, and import `classic.DataFrame` and `types.MetadataBuilder` instead of using fully-qualified names.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Follow-up to #56160 (SPARK-57113) addressing post-merge review comments and reducing duplication in the AutoCDC flow code and its test suites. No behavior change.
sql/pipelines/.../graph/FlowExecution.scala:org.json4simports out ofserializeKeyColumnNames/parseKeyColumnNamesto the top of the file, per Spark's import conventions.auxiliaryKeyColumnNamesandvalidateNoAutoCdcKeyDriftinto a singleexpectedAuxiliaryKeyFieldshelper.scala.collection.mutableinstead of using a fully-qualified inline reference.Tests:
singleAutoCdcFlowPipelinehelper toAutoCdcGraphExecutionTestMixinand use it across the AutoCDC SCD1 E2E suites (AutoCdcScd1KeyDriftSuite,AutoCdcScd1MultiPipelineSuite,AutoCdcScd1AuxiliaryTableDurabilitySuite,AutoCdcScd1SchemaEvolutionSuite), removing the repeated single-table/single-flowTestGraphRegistrationContextregistration boilerplate.Why are the changes needed?
Addresses non-blocking review feedback left on #56160 and reduces duplication in the AutoCDC flow code and its tests, improving readability and maintainability. The net diff removes ~300 lines.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pure refactor with no behavior change, covered by the existing AutoCDC suites:
AutoCdcAuxiliaryTableSuiteAutoCdcScd1KeyDriftSuiteAutoCdcScd1MultiPipelineSuiteAutoCdcScd1AuxiliaryTableDurabilitySuiteAutoCdcScd1SchemaEvolutionSuiteWas this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)