fix(backend): don't let a client disconnect cancel the model load#10653
Open
localai-bot wants to merge 1 commit into
Open
fix(backend): don't let a client disconnect cancel the model load#10653localai-bot wants to merge 1 commit into
localai-bot wants to merge 1 commit into
Conversation
Image generation (and the tts/transcript/embeddings/vad/rerank/llm helpers) pass the request context to loader.Load so distributed routing decisions reach the request's X-LocalAI-Node holder. That context also governs cancellation of the load, so when a client disconnects mid-load the LoadModel RPC is aborted, stopLoadProcess tears down the backend process, and every retry restarts from scratch. Heavy diffusers/LLM models on a slow host (e.g. a shared-memory iGPU) take long enough to load that the request routinely ends first, so the model never finishes loading and the UI shows "NetworkError when attempting to fetch resource". Wrap the load context with context.WithoutCancel: the routing holder value still propagates, but the request's cancellation no longer aborts the load, so it runs to completion and caches for the next request. Inference keeps the cancellable request context, so a disconnect still stops generation. Adds a regression spec asserting a canceled request context does not cancel the model load while the routing holder still reaches the router. Fixes #10636 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #10636 — on a system with a slow, shared-memory AMD iGPU, loading any Stable Diffusion / Diffusers model fails with
context canceled+Backend process exited unexpectedly, and the Studio UI shows "NetworkError when attempting to fetch resource".Root cause
The image-generation helper (and
tts/transcript/embeddings/vad/rerank/llm) pass the request context toloader.Load(...)viamodel.WithContext(ctx)so distributed-routing decisions reach the request'sX-LocalAI-Nodeholder. But that same context also governs cancellation of the load:A heavy diffusers model (Chroma, FLUX, SDXL) on a shared-memory iGPU takes longer to load than the client stays connected. When the request context cancels mid-load, the
LoadModelRPC is aborted, the backend process is torn down, and every retry restarts from scratch — so the model never finishes loading. This matches the reporter's backend trace exactly (Received termination signal. Shutting down...~22s into the load, thencontext canceled).Fix
Wrap the load context with
context.WithoutCancel(ctx): the routing holder value still propagates (theX-LocalAI-Nodecontract is preserved), but the request's cancellation no longer aborts the load, so it runs to completion and caches for the next request. Inference keeps the cancellable request context, so a client disconnect still stops generation.Applied uniformly across the 8 backend helper load sites (image/tts×2/transcript/embeddings/vad/rerank/llm) since they share the exact pattern and the same tested
X-LocalAI-Nodecontract.Tests
core/backend/ctx_propagation_test.go: a canceled request context must not cancel the model load (the router-side/load context is notDone) while the routing holder still reaches the router. Verified functional red before the fix, green after.core/backendsuite: 87/87 green;golangci-lint run ./core/backend/...→ 0 issues.Note
There is no server- or UI-side timeout in the code that would cancel the request at ~22s; the trigger in the reporter's setup is most likely their browser or an intermediate proxy in the
webui/flowisedocker network. This fix is robust to any such client-side cancellation regardless of source.Assisted-by: Claude:claude-opus-4-8 [Claude Code]