Skip to content

[SPARK-56632][CONNECT][TESTS][4.2] Add E2E test for self-join reusing a DataFrame#56287

Draft
longvu-db wants to merge 2 commits into
apache:branch-4.2from
longvu-db:selfjoin-probe-42
Draft

[SPARK-56632][CONNECT][TESTS][4.2] Add E2E test for self-join reusing a DataFrame#56287
longvu-db wants to merge 2 commits into
apache:branch-4.2from
longvu-db:selfjoin-probe-42

Conversation

@longvu-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds an end-to-end Spark Connect test (ClientE2ETestSuite) that joins two independently constructed DataFrames over the same data and selects columns from both sides of the self-join.

Why are the changes needed?

To verify that column resolution from both sides of a self-join reusing a DataFrame works correctly (post SPARK-56632).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New test in ClientE2ETestSuite.

longvu-db added 2 commits June 2, 2026 20:05
… a DataFrame

### What changes were proposed in this pull request?
Adds an end-to-end Spark Connect test that joins two independently constructed DataFrames over the same data and selects columns from both sides.

### Why are the changes needed?
To verify column resolution from both sides of a self-join reusing a DataFrame works (post SPARK-56632).

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
New test in `ClientE2ETestSuite`.
…ernal write between resolutions

Rewrite the test to resolve the same table twice (version X and version X+1
after an external write) and self-join the two DataFrames, matching the
reported scenario.
@longvu-db longvu-db marked this pull request as draft June 2, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant