Skip to content

fix(backend): don't let a client disconnect cancel the model load#10653

Open
localai-bot wants to merge 1 commit into
masterfrom
worktree-investigate-diffusers-load-cancel-10636
Open

fix(backend): don't let a client disconnect cancel the model load#10653
localai-bot wants to merge 1 commit into
masterfrom
worktree-investigate-diffusers-load-cancel-10636

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Description

Fixes #10636 — on a system with a slow, shared-memory AMD iGPU, loading any Stable Diffusion / Diffusers model fails with context canceled + Backend process exited unexpectedly, and the Studio UI shows "NetworkError when attempting to fetch resource".

Root cause

The image-generation helper (and tts/transcript/embeddings/vad/rerank/llm) pass the request context to loader.Load(...) via model.WithContext(ctx) so distributed-routing decisions reach the request's X-LocalAI-Node holder. But that same context also governs cancellation of the load:

LoadModel(o.context, options)  // pkg/model/initializers.go:175
  -> ctx canceled -> RPC aborts -> stopLoadProcess() SIGTERMs the backend

A heavy diffusers model (Chroma, FLUX, SDXL) on a shared-memory iGPU takes longer to load than the client stays connected. When the request context cancels mid-load, the LoadModel RPC is aborted, the backend process is torn down, and every retry restarts from scratch — so the model never finishes loading. This matches the reporter's backend trace exactly (Received termination signal. Shutting down... ~22s into the load, then context canceled).

Fix

Wrap the load context with context.WithoutCancel(ctx): the routing holder value still propagates (the X-LocalAI-Node contract is preserved), but the request's cancellation no longer aborts the load, so it runs to completion and caches for the next request. Inference keeps the cancellable request context, so a client disconnect still stops generation.

Applied uniformly across the 8 backend helper load sites (image/tts×2/transcript/embeddings/vad/rerank/llm) since they share the exact pattern and the same tested X-LocalAI-Node contract.

Tests

  • New regression spec in core/backend/ctx_propagation_test.go: a canceled request context must not cancel the model load (the router-side/load context is not Done) while the routing holder still reaches the router. Verified functional red before the fix, green after.
  • Full core/backend suite: 87/87 green; golangci-lint run ./core/backend/... → 0 issues.

Note

There is no server- or UI-side timeout in the code that would cancel the request at ~22s; the trigger in the reporter's setup is most likely their browser or an intermediate proxy in the webui/flowise docker network. This fix is robust to any such client-side cancellation regardless of source.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Image generation (and the tts/transcript/embeddings/vad/rerank/llm helpers)
pass the request context to loader.Load so distributed routing decisions
reach the request's X-LocalAI-Node holder. That context also governs
cancellation of the load, so when a client disconnects mid-load the
LoadModel RPC is aborted, stopLoadProcess tears down the backend process,
and every retry restarts from scratch. Heavy diffusers/LLM models on a slow
host (e.g. a shared-memory iGPU) take long enough to load that the request
routinely ends first, so the model never finishes loading and the UI shows
"NetworkError when attempting to fetch resource".

Wrap the load context with context.WithoutCancel: the routing holder value
still propagates, but the request's cancellation no longer aborts the load,
so it runs to completion and caches for the next request. Inference keeps the
cancellable request context, so a disconnect still stops generation.

Adds a regression spec asserting a canceled request context does not cancel
the model load while the routing holder still reaches the router.

Fixes #10636

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dockerised LocalAI on a system with integrated AMD GPU and shared memory cannot load any Stable diffusion or Diffusers model

2 participants