Skip to content

fix: relax exact-batch-count assertions in interrupt/resume tests#432

Merged
driv3r merged 1 commit into
get-rid-of-vendored-packagesfrom
fix-flaky-interrupt-resume-test
Apr 15, 2026
Merged

fix: relax exact-batch-count assertions in interrupt/resume tests#432
driv3r merged 1 commit into
get-rid-of-vendored-packagesfrom
fix-flaky-interrupt-resume-test

Conversation

@driv3r
Copy link
Copy Markdown
Contributor

@driv3r driv3r commented Apr 15, 2026

TERM is delivered asynchronously via Go's signal channel. The signal handler calls ErrorHandler.Fatal which panics, but there is a race window between send_signal("TERM") returning in the Ruby callback and the signal actually interrupting the data iterator goroutine. A second 200-row batch can complete before the panic propagates, causing the target row count to be 400 instead of the asserted 200.

The real correctness invariant is already checked by the subsequent assertions: last_successful_id from the target matches the pagination key in the dumped state. The exact count check adds no additional safety and only makes the tests fragile. Relax both the integer-keyed and UUID-keyed variants to assert_operator :>= 200.

@driv3r driv3r requested a review from a team April 15, 2026 11:00
@driv3r driv3r added the Bug Something isn't working label Apr 15, 2026
Copy link
Copy Markdown

@austenLacy austenLacy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to add to the comment why the subsequent ID assertions make a >200 count OK.

@driv3r driv3r force-pushed the get-rid-of-vendored-packages branch from 3cd3fd2 to fb0ca17 Compare April 15, 2026 12:58
@driv3r driv3r force-pushed the fix-flaky-interrupt-resume-test branch from ef4568c to f97faf3 Compare April 15, 2026 12:58
TERM is delivered asynchronously via Go's signal channel. The signal
handler calls ErrorHandler.Fatal which panics, but there is a race window
between send_signal("TERM") returning in the Ruby callback and the
signal actually interrupting the data iterator goroutine. A second 200-row
batch can complete before the panic propagates, causing the target row
count to be 400 instead of the asserted 200.

The real correctness invariant is already checked by the subsequent
assertions: last_successful_id from the target matches the pagination key
in the dumped state. The exact count check adds no additional safety and
only makes the tests fragile. Relax both the integer-keyed and UUID-keyed
variants to assert_operator :>= 200.
@driv3r driv3r force-pushed the fix-flaky-interrupt-resume-test branch from f97faf3 to e404408 Compare April 15, 2026 13:04
@driv3r driv3r merged commit 8e957e3 into get-rid-of-vendored-packages Apr 15, 2026
13 checks passed
@driv3r driv3r deleted the fix-flaky-interrupt-resume-test branch April 15, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants