Skip to content

feat(etl-uvicorn): cooperative SIGTERM shutdown via GracefulServer + CancellationToken#70

Open
aballman wants to merge 9 commits into
mainfrom
aballman/infra-820-plugin-stop-in-flight-foundation
Open

feat(etl-uvicorn): cooperative SIGTERM shutdown via GracefulServer + CancellationToken#70
aballman wants to merge 9 commits into
mainfrom
aballman/infra-820-plugin-stop-in-flight-foundation

Conversation

@aballman
Copy link
Copy Markdown
Contributor

@aballman aballman commented Jun 5, 2026

Summary

Plugin uvicorn webservers now shut down cooperatively on SIGTERM instead of draining the in-flight request for an unbounded time. A cancelled job's plugin container can stop work and exit promptly rather than riding the pod termination grace cap.

Changes

  • GracefulServer (serve.py): subclasses uvicorn.Server and overrides handle_exit to set a process-global cancellation event before delegating to uvicorn's own shutdown. The etl-uvicorn CLI launches through it.
  • CancellationToken (shutdown.py): a read-only token (no .set()) backed by that event. wrap_in_fastapi injects it into any run-function that declares a cancellation_token parameter — the same opt-in mechanism as usage / message_channels / filedata_meta. Sync run-loops poll raise_if_cancelled() between units of work; uvicorn cannot cancel a threadpool thread, so this is the only way a sync plugin stops promptly. A cancellation_dependency() FastAPI dependency is provided for plugins that build their own app.
  • PluginShutdown → HTTP 503: raised by a bailing run-loop, caught before the generic exception handler in both the non-streaming and streaming paths, and returned as a distinct shutdown-abort response rather than a plugin failure.
  • Finite timeout_graceful_shutdown: default 30s, overridable via UVICORN_TIMEOUT_GRACEFUL_SHUTDOWN. uvicorn's default is unbounded, so an in-flight async request previously had no drain ceiling.
  • Schema contract: cancellation_token is omitted from the /invoke request model, the /schema output, and the plugin-id hash, so it never appears as a caller-facing input.
  • Removed the inert SIGTERM-ignore patch from 0.0.44: it set uvicorn.Server.install_signal_handlers, which the pinned uvicorn (0.37) removed in favor of capture_signals(), so it never took effect. Prompt shutdown now comes from the finite timeout plus the GracefulServer hook.
  • check_precheck_func: validates parameter names against the injected set (and accepts cancellation_token), replacing logic that indexed a non-subscriptable dict_values.

Testing

  • 92 unit tests pass; ruff check clean. New coverage: token injection and schema non-leak (sync and async), PluginShutdown → 503 (non-streaming and streaming), cooperative stop of a threadpool-bound sync run-loop, GracefulServer.handle_exit, serve(), and check_precheck_func.
  • Manual: SIGTERM to a running etl-uvicorn server triggers the cancellation hook and the process exits within ~1s, well under any pod grace.

Compatibility

Behavior is unchanged for plugins that do not declare cancellation_token. Version bumped to 0.0.45.

Follow-ups (not in this PR)

  • Per-plugin adoption across the fleet: declaring cancellation_token and threading it through provider-call loops, plus a finite graceful timeout at every launch site.
  • Controller-side mapping of the 503 shutdown-abort to a retryable record outcome.

@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedpypi/​pytest-asyncio@​1.4.0100100100100100

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant