Summary
cocoon vm status [VMID] defaults to a 5-second polling loop with no built-in deadline or pipe-close detection. When the caller's terminal/tmux disconnects without delivering SIGHUP to the cocoon process (very common when piped through sudo or wrapped in bash -c and the controlling session dies), the polling loop survives indefinitely.
On the cocoonset-gke-private node cocoon-pool-2 we found a 21-day-old orphan from this exact pattern:
1965315 1 root S 21-08:24:21 sudo cocoon vm status --format json
1965317 1965315 root Sl 21-08:24:21 cocoon vm status --format json
1965480 1 bytedang+ Ss 21-08:18:49 bash -c sudo cocoon vm status <vmid> --format json | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('pid', 0))"
The python3 wrapper calls json.load(sys.stdin) which blocks on EOF, so the pipe reader never goes away → cocoon never receives SIGPIPE → cocoon keeps polling. The user's intent was a one-shot inspect; they got a forever loop instead.
Root cause
cmd/vm/status.go:77-80:
interval, _ := cmd.Flags().GetInt("interval")
if interval <= 0 {
interval = 5 //nolint:mnd
}
There is no one-shot path; vm status always enters the polling loop. The proper one-shot is vm inspect, but the affordance is unclear — many operators reach for vm status <vmid> expecting --format json to print once and exit.
Proposed fix
Pick one or more:
- Default to one-shot, require
--watch (or --event, currently a flag) to enter the polling loop. Breaking change but matches Unix convention (status ≠ watch).
--interval=0 means one-shot instead of "use default 5s". Easy, non-breaking; documented in the flag help.
- Detect broken stdout: trap SIGPIPE / EPIPE on the emit path and exit cleanly. Catches the python3 + sudo orphan pattern even when neither (1) nor (2) is opted into.
- Deadline:
--timeout flag for the polling loop, defaulting to e.g. 1h so a forgotten vm status doesn't run for 21 days.
(1)+(3) together would have prevented the orphan entirely. (3) alone catches the python3-stdin pattern at the cost of not solving disconnected-tty cases when stdout is the original tty (already closed cleanly by sshd).
Out of scope
Summary
cocoon vm status [VMID]defaults to a 5-second polling loop with no built-in deadline or pipe-close detection. When the caller's terminal/tmux disconnects without delivering SIGHUP to the cocoon process (very common when piped throughsudoor wrapped inbash -cand the controlling session dies), the polling loop survives indefinitely.On the cocoonset-gke-private node
cocoon-pool-2we found a 21-day-old orphan from this exact pattern:The python3 wrapper calls
json.load(sys.stdin)which blocks on EOF, so the pipe reader never goes away → cocoon never receives SIGPIPE → cocoon keeps polling. The user's intent was a one-shot inspect; they got a forever loop instead.Root cause
cmd/vm/status.go:77-80:There is no one-shot path;
vm statusalways enters the polling loop. The proper one-shot isvm inspect, but the affordance is unclear — many operators reach forvm status <vmid>expecting--format jsonto print once and exit.Proposed fix
Pick one or more:
--watch(or--event, currently a flag) to enter the polling loop. Breaking change but matches Unix convention (status≠watch).--interval=0means one-shot instead of "use default 5s". Easy, non-breaking; documented in the flag help.--timeoutflag for the polling loop, defaulting to e.g. 1h so a forgottenvm statusdoesn't run for 21 days.(1)+(3) together would have prevented the orphan entirely. (3) alone catches the python3-stdin pattern at the cost of not solving disconnected-tty cases when stdout is the original tty (already closed cleanly by sshd).
Out of scope
vm statusbeing robust regardless of caller wrapping.