Skip to content

[FLINK-39845][tests] Fix SavepointITCase.testStopWithSavepointFailsOverToSavepoint under AdaptiveScheduler#28352

Open
MartijnVisser wants to merge 1 commit into
apache:release-2.3from
MartijnVisser:backport-FLINK-39845-release-2.3
Open

[FLINK-39845][tests] Fix SavepointITCase.testStopWithSavepointFailsOverToSavepoint under AdaptiveScheduler#28352
MartijnVisser wants to merge 1 commit into
apache:release-2.3from
MartijnVisser:backport-FLINK-39845-release-2.3

Conversation

@MartijnVisser
Copy link
Copy Markdown
Contributor

What is the purpose of the change

Backport of #28311 (commit c3ec569) to release-2.3.

The JUnit5 migration (FLINK-39124, #27667 — also present on release-2.3) replaced a cause-chain search in SavepointITCase.testStopWithSavepointFailsOverToSavepoint (ExceptionUtils.assertThrowable -> findThrowable) with a direct-cause assertion (assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)).

Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected StopWithSavepointStoppingException inside a FlinkException ("Stop with savepoint operation could not be completed."), so it is no longer the direct cause. The default scheduler still exposes it directly, so the test passes there but fails under the adaptive scheduler.

This restores the cause-chain search so the test passes under both schedulers.

Brief change log

  • Replace the direct-cause assertion in testStopWithSavepointFailsOverToSavepoint with FlinkAssertions.anyCauseMatches(StopWithSavepointStoppingException.class, "A savepoint has been created at: "), which searches the whole cause chain (restoring the pre-migration semantics).

(Note: the shared anyCauseMatches helper matches the message with contains rather than the original startsWith. This is safe here because the phrase is the leading text of StopWithSavepointStoppingException's own message, and anyCauseMatches binds the class and the message to the same throwable in the chain.)

Verifying this change

This change is already covered by the existing SavepointITCase.testStopWithSavepointFailsOverToSavepoint. The cherry-pick applied cleanly onto release-2.3 (which carries the same #27667 migration commit), and FlinkAssertions.anyCauseMatches(Class, String) is present on the branch.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no (test-only change)
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?
  • Yes (Claude Code (Claude Opus 4.8))

Generated-by: Claude Code (Claude Opus 4.8)

…erToSavepoint under AdaptiveScheduler

The JUnit5 migration (FLINK-39124, apache#27667) replaced a cause-chain search (ExceptionUtils.assertThrowable -> findThrowable) with a direct-cause assertion (assertThatThrownBy(...).hasCauseInstanceOf(StopWithSavepointStoppingException.class)). Under the AdaptiveScheduler, StopWithSavepoint.onLeave() wraps the expected StopWithSavepointStoppingException inside a FlinkException ("Stop with savepoint operation could not be completed."), so it is no longer the direct cause. This regressed the test_cron_adaptive_scheduler nightly leg on master (red every build since 2026-03-21); the default scheduler still exposes it as the direct cause and passed.

Restore the chain search via FlinkAssertions.anyCauseMatches so the test passes under both the default and adaptive schedulers, matching the pre-migration behavior still present on release-2.1.

Generated-by: Claude Code (Claude Opus 4.8)
(cherry picked from commit c3ec569)
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Jun 7, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants