Commit df61e49
committed
tests: kill zombie IPC child processes after join timeout
When Python 3.12 CI runs, env-vars enables compute-sanitizer with
--target-processes=all, which attaches to every mp.Process child the
tests spawn. On CUDA 12.9.1 the sanitizer analysis of IPC buffer teardown
gets stuck, so child processes never exit. The existing
join(timeout=CHILD_TIMEOUT_SEC) returns but leaves the child alive.
That zombie keeps its IPC handle open. When pytest teardown runs
ipc_memory_resource's mr.close(), it blocks waiting for the handle to
be released — tying up the runner for hours until GitHub Actions
force-cancels the job. This is the exact pattern in issue #2004
(always Python 3.12 + CUDA 12.9.1 local).
Fix: after join(timeout=...), kill any process still alive so the IPC
handle is released before fixture teardown. Tests still fail (exit code
is non-zero or completed is False), just in seconds rather than hours.
Fixes #20041 parent 326d522 commit df61e49
1 file changed
Lines changed: 18 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
42 | 49 | | |
43 | 50 | | |
44 | 51 | | |
| |||
96 | 103 | | |
97 | 104 | | |
98 | 105 | | |
99 | | - | |
| 106 | + | |
100 | 107 | | |
101 | 108 | | |
102 | 109 | | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
103 | 120 | | |
104 | 121 | | |
105 | 122 | | |
| |||
0 commit comments