Skip to content

Zombie coroutine in a long delay() keeps the event loop alive until the timer expires #132

@EdmondDantes

Description

@EdmondDantes

Surfaced during the #125 / #129 chaos-test work.

Context (intended behaviour)

A safe scope's dispose() / cancel() does not force-cancel an
already-started child — it marks the child a zombie
(coroutine.c, async_coroutine_cancel, the is_safely branch) and drops
it from the active coroutine count. This was investigated and is by design:
a safe scope lets a running child finish gracefully rather than tearing it
down mid-flight. asNotSafely() opts into forced cancellation.

The problem

A zombie parked in a long delay() (or any timer await) keeps its libuv
timer armed. An armed timer keeps the event loop alive until the timer
naturally expires — even when nothing else is left to run.

Consequently Scope::disposeAfterTimeout(), whose whole point is a
bounded cleanup, can still hang the loop for the full remaining sleep
duration of a zombie child. The "timeout" is effectively ignored: the
process cannot exit until the arbitrary delay() elapses on its own.

Reproduction sketch

  • Open a safe scope, spawn a child that does Async\delay(<long>).
  • disposeAfterTimeout(<short>) the scope.
  • The child becomes a zombie at the short timeout, but the process keeps
    running until the long delay() expires, not the short timeout.

Proposed idea

When only zombies remain (active coroutine count is 0 and the sole
remaining work is zombie coroutines), deliver a cancellation to them so the
process can exit. A cancellation is still graceful — the zombie's
finally / catch blocks unwind normally — so it does not violate the
"let it finish gracefully" contract; it only skips the arbitrary sleep that
nothing is waiting on anymore.

Where to investigate

scheduler.c shutdown path:

  • the loop stop condition / the active_coroutines > real_coroutines check
  • start_graceful_shutdown

Determine whether zombie timers are already drained anywhere, and whether a
"cancel zombies when only zombies remain" step is the right fix or whether
the stop condition itself should treat zombie-only state as quiescent.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No fields configured for Task.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions