LiteLLM reliability: hot-reload, zero-downtime restarts, and single-replica fragility

## Problem

LiteLLM is a single point of failure in the stack. Every configuration change (`obol model setup`, provider addition) requires a full pod restart, causing complete inference downtime. During `obol stack up`, LiteLLM is restarted 2-3 times.

### Current issues

1. **Single replica** — 1 pod, no PDB. Every restart = full downtime (30s-5min depending on image pull)
2. **No hot-reload** — LiteLLM does not watch config.yaml for changes. Config patched via ConfigMap, then `kubectl rollout restart` required
3. **Non-fatal rollout timeout** — `RestartLiteLLM()` returns success even when the 90s rollout times out, silently leaving LiteLLM in a broken state
4. **`drop_params: true`** — silently drops request parameters that don't match downstream provider schema, making debugging difficult
5. **No Reloader annotation** — Secret changes (API key rotation) don't trigger restart automatically

### Impact

- Agent chat unavailable during every `obol model setup` or provider configuration
- Initial `obol stack up` has 270s+ of intermittent LiteLLM downtime
- Silent parameter loss makes cross-provider routing unreliable

## Solution

### Implemented in #320

1. **Hot-add via `/model/new` API** — model-only changes are applied immediately via LiteLLM's in-memory router API. ConfigMap still patched for persistence. Restart only needed for API key changes (Secret mount).
2. **2 replicas + RollingUpdate** — `maxUnavailable: 0`, `maxSurge: 1` ensures a new pod is ready before any old pod terminates
3. **PodDisruptionBudget** — `minAvailable: 1` prevents both replicas from being down simultaneously
4. **preStop hook** — 10s sleep before SIGTERM gives EndpointSlice time to deregister the pod
5. **Reloader annotation** — `secret.reloader.stakater.com/reload: litellm-secrets` triggers rolling restart on Secret changes (API key rotation)
6. **`terminationGracePeriodSeconds: 60`** — gives long inference requests time to complete

### Not yet addressed

- `drop_params: true` behavior (needs per-model investigation)
- ConfigMap size validation
- Horizontal pod autoscaling for high concurrency

## References

- LiteLLM does NOT support config.yaml hot-reload: BerriAI/litellm#20409, BerriAI/litellm#964
- LiteLLM `/model/new` API: adds models to in-memory router without restart
- Supply chain pin: BerriAI/litellm#24512 (PyPI malware incident)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LiteLLM reliability: hot-reload, zero-downtime restarts, and single-replica fragility #321

Problem

Current issues

Impact

Solution

Implemented in #320

Not yet addressed

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LiteLLM reliability: hot-reload, zero-downtime restarts, and single-replica fragility #321

Description

Problem

Current issues

Impact

Solution

Implemented in #320

Not yet addressed

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions