Commit 72ccbe7
tests: kill zombie IPC child processes after join timeout
If a spawned process gets stuck during CUDA context teardown (the
Python 3.12 + CUDA 12.9.1 hang pattern from issue #2004), the existing
join(timeout=CHILD_TIMEOUT_SEC) returns but leaves the child alive. That
zombie holds open IPC handles, causing the ipc_memory_resource fixture's
mr.close() to block indefinitely and tie up the runner for hours.
Kill any process that is still alive after its join timeout so the IPC
handle is released before fixture teardown runs. The test still fails
(exit code != 0 or completed == False), just quickly instead of hanging.
Fixes #2004
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 326d522 commit 72ccbe7
1 file changed
Lines changed: 18 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
42 | 49 | | |
43 | 50 | | |
44 | 51 | | |
| |||
96 | 103 | | |
97 | 104 | | |
98 | 105 | | |
99 | | - | |
| 106 | + | |
100 | 107 | | |
101 | 108 | | |
102 | 109 | | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
103 | 120 | | |
104 | 121 | | |
105 | 122 | | |
| |||
0 commit comments