Problem
LiteLLM is a single point of failure in the stack. Every configuration change (obol model setup, provider addition) requires a full pod restart, causing complete inference downtime. During obol stack up, LiteLLM is restarted 2-3 times.
Current issues
- Single replica — 1 pod, no PDB. Every restart = full downtime (30s-5min depending on image pull)
- No hot-reload — LiteLLM does not watch config.yaml for changes. Config patched via ConfigMap, then
kubectl rollout restart required
- Non-fatal rollout timeout —
RestartLiteLLM() returns success even when the 90s rollout times out, silently leaving LiteLLM in a broken state
drop_params: true — silently drops request parameters that don't match downstream provider schema, making debugging difficult
- No Reloader annotation — Secret changes (API key rotation) don't trigger restart automatically
Impact
- Agent chat unavailable during every
obol model setup or provider configuration
- Initial
obol stack up has 270s+ of intermittent LiteLLM downtime
- Silent parameter loss makes cross-provider routing unreliable
Solution
Implemented in #320
- Hot-add via
/model/new API — model-only changes are applied immediately via LiteLLM's in-memory router API. ConfigMap still patched for persistence. Restart only needed for API key changes (Secret mount).
- 2 replicas + RollingUpdate —
maxUnavailable: 0, maxSurge: 1 ensures a new pod is ready before any old pod terminates
- PodDisruptionBudget —
minAvailable: 1 prevents both replicas from being down simultaneously
- preStop hook — 10s sleep before SIGTERM gives EndpointSlice time to deregister the pod
- Reloader annotation —
secret.reloader.stakater.com/reload: litellm-secrets triggers rolling restart on Secret changes (API key rotation)
terminationGracePeriodSeconds: 60 — gives long inference requests time to complete
Not yet addressed
drop_params: true behavior (needs per-model investigation)
- ConfigMap size validation
- Horizontal pod autoscaling for high concurrency
References
Problem
LiteLLM is a single point of failure in the stack. Every configuration change (
obol model setup, provider addition) requires a full pod restart, causing complete inference downtime. Duringobol stack up, LiteLLM is restarted 2-3 times.Current issues
kubectl rollout restartrequiredRestartLiteLLM()returns success even when the 90s rollout times out, silently leaving LiteLLM in a broken statedrop_params: true— silently drops request parameters that don't match downstream provider schema, making debugging difficultImpact
obol model setupor provider configurationobol stack uphas 270s+ of intermittent LiteLLM downtimeSolution
Implemented in #320
/model/newAPI — model-only changes are applied immediately via LiteLLM's in-memory router API. ConfigMap still patched for persistence. Restart only needed for API key changes (Secret mount).maxUnavailable: 0,maxSurge: 1ensures a new pod is ready before any old pod terminatesminAvailable: 1prevents both replicas from being down simultaneouslysecret.reloader.stakater.com/reload: litellm-secretstriggers rolling restart on Secret changes (API key rotation)terminationGracePeriodSeconds: 60— gives long inference requests time to completeNot yet addressed
drop_params: truebehavior (needs per-model investigation)References
/model/newAPI: adds models to in-memory router without restart