fix(backend): don't let a client disconnect cancel the model load by localai-bot · Pull Request #10653 · mudler/LocalAI

localai-bot · 2026-07-02T20:55:38Z

Description

Fixes #10636 — on a system with a slow, shared-memory AMD iGPU, loading any Stable Diffusion / Diffusers model fails with context canceled + Backend process exited unexpectedly, and the Studio UI shows "NetworkError when attempting to fetch resource".

Root cause

The image-generation helper (and tts/transcript/embeddings/vad/rerank/llm) pass the request context to loader.Load(...) via model.WithContext(ctx) so distributed-routing decisions reach the request's X-LocalAI-Node holder. But that same context also governs cancellation of the load:

LoadModel(o.context, options)  // pkg/model/initializers.go:175
  -> ctx canceled -> RPC aborts -> stopLoadProcess() SIGTERMs the backend

A heavy diffusers model (Chroma, FLUX, SDXL) on a shared-memory iGPU takes longer to load than the client stays connected. When the request context cancels mid-load, the LoadModel RPC is aborted, the backend process is torn down, and every retry restarts from scratch — so the model never finishes loading. This matches the reporter's backend trace exactly (Received termination signal. Shutting down... ~22s into the load, then context canceled).

Fix

Wrap the load context with context.WithoutCancel(ctx): the routing holder value still propagates (the X-LocalAI-Node contract is preserved), but the request's cancellation no longer aborts the load, so it runs to completion and caches for the next request. Inference keeps the cancellable request context, so a client disconnect still stops generation.

Applied uniformly across the 8 backend helper load sites (image/tts×2/transcript/embeddings/vad/rerank/llm) since they share the exact pattern and the same tested X-LocalAI-Node contract.

Tests

New regression spec in core/backend/ctx_propagation_test.go: a canceled request context must not cancel the model load (the router-side/load context is not Done) while the routing holder still reaches the router. Verified functional red before the fix, green after.
Full core/backend suite: 87/87 green; golangci-lint run ./core/backend/... → 0 issues.

Note

There is no server- or UI-side timeout in the code that would cancel the request at ~22s; the trigger in the reporter's setup is most likely their browser or an intermediate proxy in the webui/flowise docker network. This fix is robust to any such client-side cancellation regardless of source.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Image generation (and the tts/transcript/embeddings/vad/rerank/llm helpers) pass the request context to loader.Load so distributed routing decisions reach the request's X-LocalAI-Node holder. That context also governs cancellation of the load, so when a client disconnects mid-load the LoadModel RPC is aborted, stopLoadProcess tears down the backend process, and every retry restarts from scratch. Heavy diffusers/LLM models on a slow host (e.g. a shared-memory iGPU) take long enough to load that the request routinely ends first, so the model never finishes loading and the UI shows "NetworkError when attempting to fetch resource". Wrap the load context with context.WithoutCancel: the routing holder value still propagates, but the request's cancellation no longer aborts the load, so it runs to completion and caches for the next request. Inference keeps the cancellable request context, so a disconnect still stops generation. Adds a regression spec asserting a canceled request context does not cancel the model load while the routing holder still reaches the router. Fixes #10636 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(backend): don't let a client disconnect cancel the model load#10653

fix(backend): don't let a client disconnect cancel the model load#10653
localai-bot wants to merge 1 commit into
masterfrom
worktree-investigate-diffusers-load-cancel-10636

localai-bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jul 2, 2026

Description

Root cause

Fix

Tests

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants