Skip to content

Commit a476a2c

Browse files
committed
tests: tighten memory_ipc outer timeout to CHILD_TIMEOUT_SEC + 30
The previous 3 * CHILD_TIMEOUT_SEC scaling assumed worst-case wall-clock where every sequential join/wait hits its full timeout. In practice the children run concurrently, so expected wall-clock is ~CHILD_TIMEOUT_SEC regardless of how many joins the test chains -- once a child is done its join returns immediately. Exceeding CHILD_TIMEOUT_SEC + slack already means something is genuinely stuck, in which case the outer guard firing is the right outcome; the autouse track_child_processes() context manager still cleans up survivors, and the per-test diagnostic message would not be more informative than "test exceeded its budget". New budgets: - Without compute-sanitizer: 60 s (was 90 s). - Under compute-sanitizer: 150 s (was 360 s). The meta-test computes its expected value from the same formula.
1 parent fe5b914 commit a476a2c

2 files changed

Lines changed: 10 additions & 8 deletions

File tree

cuda_core/tests/memory_ipc/conftest.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,15 @@
1919

2020

2121
def _outer_timeout_sec() -> int:
22-
# The worst-case IPC test has three sequential CHILD_TIMEOUT_SEC waits in
23-
# the failure path (e.g. TestIpcReexport: event_c.wait, proc_b.join,
24-
# proc_c.join). Scaling by 3 lets such a test reach its own asserts before
25-
# the outer guard fires, while still cutting the budget by half on
26-
# non-sanitizer runs (90 s vs the previous 300 s) and scaling up under
27-
# compute-sanitizer (360 s).
28-
return 3 * child_timeout_sec()
22+
# IPC tests spawn children that run concurrently, so expected wall-clock
23+
# is ~CHILD_TIMEOUT_SEC regardless of how many subsequent join/wait
24+
# timeouts the test chains together (each subsequent join returns
25+
# immediately once its child is already done). Exceeding that already
26+
# means something is genuinely stuck, at which point the outer guard
27+
# firing is the right outcome -- the per-test asserts wouldn't add
28+
# useful diagnostic value over "test exceeded its budget", and the
29+
# autouse track_child_processes() context manager still cleans up.
30+
return child_timeout_sec() + 30
2931

3032

3133
def pytest_collection_modifyitems(config, items):

cuda_core/tests/memory_ipc/test_errors.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def test_outer_timeout_marker_is_applied(request):
2525
the in-test cleanup -- which we want to keep as defense in depth, not as
2626
the sole guard.
2727
"""
28-
expected = 3 * child_timeout_sec()
28+
expected = child_timeout_sec() + 30
2929
marker = request.node.get_closest_marker("timeout")
3030
assert marker is not None, "memory_ipc/conftest.py did not apply a timeout marker"
3131
assert marker.args == (expected,), f"unexpected timeout value: {marker.args!r}"

0 commit comments

Comments
 (0)