Skip to content

feat: LiteLLM API model management + buyer sidecar reload#333

Closed
bussyjd wants to merge 26 commits intomainfrom
feat/litellm-api-management
Closed

feat: LiteLLM API model management + buyer sidecar reload#333
bussyjd wants to merge 26 commits intomainfrom
feat/litellm-api-management

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented Apr 9, 2026

Summary

Replace fragile ConfigMap YAML read-modify-write cycles with HTTP API calls to our LiteLLM fork (ObolNetwork/litellm) for model management. Eliminates pod restarts for model add/remove operations.

  • model.go: litellmAPIViaExec() fans out API calls to all pods, hotDeleteModel() for live removal, RemoveModel() and AddCustomEndpoint() no longer restart
  • Controller: removeLiteLLMModelEntry() implemented (was no-op stub), wired into PurchaseRequest deletion cleanup
  • Buyer sidecar: POST /admin/reload endpoint for immediate config pickup (vs 5s ticker wait)
  • LiteLLM image: switched to Obol fork ghcr.io/obolnetwork/litellm:sha-fe892e3

Architecture

Before vs After

  BEFORE (ConfigMap YAML + Restart)          AFTER (HTTP API + Hot-Reload)
  ─────────────────────────────────          ────────────────────────────────
  CLI: patchLiteLLMConfig()                  CLI: litellmAPIViaExec()
    ↓ read YAML → parse → merge               ↓ POST /model/new to ALL pods
    ↓ marshal → patch ConfigMap                ↓ ConfigMap patch (persistence)
    ↓ kubectl rollout restart                  ↓ NO RESTART
    ↓ wait 90s for rollout                     ↓ instant effect
    ✗ race-prone with replicas:2               ✓ fan-out to every pod

  Controller: ConfigMap read-modify-write    Controller: direct HTTP
    ↓ GET ConfigMap → parse YAML               ↓ POST /model/new (one HTTP call)
    ↓ merge model_list                         ↓ POST /model/delete (implemented!)
    ↓ marshal → UPDATE ConfigMap               ↓ POST /admin/reload on sidecar
    ↓ restart deployment                       ↓ NO RESTART
    ✗ fragile YAML parsing                     ✓ atomic API calls

Data Flow: Model Lifecycle

                    ┌──────────────────────────────────────────────────────┐
                    │                MODEL MANAGEMENT                      │
                    └──────────────────────────────────────────────────────┘

  ┌─────────────────────┐         ┌─────────────────────┐
  │   CLI (Host)        │         │  Controller (In-Cluster)
  │   obol model setup  │         │  serviceoffer-controller
  │   obol model remove │         │  PurchaseRequest reconciler
  └────────┬────────────┘         └────────┬────────────┘
           │                               │
     ┌─────▼──────┐                  ┌─────▼──────┐
     │ ConfigMap   │                 │ Direct HTTP │
     │ Patch (RMW) │                 │ to LiteLLM  │
     │ persistence │                 │ svc:4000    │
     └─────┬───────┘                 └─────┬───────┘
           │                               │
     ┌─────▼──────────────┐          ┌─────▼──────┐
     │ litellmAPIViaExec  │          │ /model/new  │
     │ kubectl exec → pod │          │ /model/del  │
     │ fans out to ALL    │          │ /model/info │
     │ running pods       │          └─────┬───────┘
     └─────┬──────────────┘                │
           │                               │
           ▼                               ▼
  ┌────────────────────────────────────────────────┐
  │               LiteLLM Pods (replicas: 2)       │
  │                                                │
  │  ┌──────────────┐      ┌──────────────┐        │
  │  │   Pod 1      │      │   Pod 2      │        │
  │  │ litellm:4000 │      │ litellm:4000 │        │
  │  │ buyer:8402   │      │ buyer:8402   │        │
  │  └──────────────┘      └──────────────┘        │
  │                                                │
  │  ConfigMap volume mount = persistence layer    │
  │  API calls = live hot-reload layer             │
  └────────────────────────────────────────────────┘

Two Persistence Layers

  ┌────────────────────────────────────────────────────────────────────┐
  │  1. ConfigMap (litellm-config)  — SOURCE OF TRUTH                 │
  │     Survives pod restarts, visible to all replicas                │
  │     Written by: CLI patchLiteLLMConfig(), controller N/A (paid/*) │
  │     Read by: LiteLLM on startup (volume mount)                    │
  ├────────────────────────────────────────────────────────────────────┤
  │  2. LiteLLM In-Memory Router — LIVE STATE                        │
  │     Immediate effect, per-pod, lost on restart                    │
  │     Written by: /model/new, /model/delete API                    │
  │     Read by: every inference request                              │
  └────────────────────────────────────────────────────────────────────┘

  Write pattern:  ConfigMap first (persistence) → API second (live)
  Read pattern:   Router serves from memory → ConfigMap on restart

Buy-Side Payment Flow (PurchaseRequest)

  buy.py                    Controller                  LiteLLM + Sidecar
  ──────                    ──────────                  ──────────────────

  1. Probe endpoint ───────────────────────────────────→ 402 pricing
     ◄────────────────────────────────────────────────── accepts[0]

  2. Pre-sign ERC-3009 ──→ 3. Create PurchaseRequest CR
     auths in spec            (spec.preSignedAuths)

                           4. Reconcile stages:
                              Probed → AuthsSigned → Configured → Ready

                           5. mergeBuyerConfig()  ────→ x402-buyer-config CM
                              mergeBuyerAuths()   ────→ x402-buyer-auths CM
                              triggerBuyerReload() ───→ POST /admin/reload  ←── NEW
                              addLiteLLMModelEntry()─→ POST /model/new

                           6. checkBuyerStatus()  ────→ GET :8402/status
                              remaining + spent        ◄─── JSON response

  ── DELETION ──────────────────────────────────────────────────────────

                           7. reconcileDeletingPurchase():
                              removeLiteLLMModelEntry()→ GET /model/info   ←── NEW
                                                       → POST /model/delete ←── NEW
                              removeBuyerUpstream()   → patch both CMs

Buyer Sidecar API Surface

  x402-buyer (:8402)
  ┌──────────────────────────────────────────────────────────────┐
  │                                                              │
  │  GET  /healthz           → "ok"                              │
  │  GET  /status            → per-upstream remaining/spent JSON │
  │  GET  /metrics           → Prometheus metrics                │
  │  POST /admin/reload      → trigger immediate config re-read  │  ←── NEW
  │                                                              │
  │  POST /v1/chat/completions  → OpenAI-compatible proxy        │
  │  POST /chat/completions     → alias                          │
  │  POST /v1/responses         → OpenAI responses API           │
  │  POST /upstream/<name>/*    → legacy direct routing          │
  │                                                              │
  │  Config reload: ticker (5s) OR /admin/reload channel         │
  │  Persistence: reads from ConfigMap volume mounts             │
  └──────────────────────────────────────────────────────────────┘

CLI Operations Matrix

  Operation              │ ConfigMap  │ API Call           │ Restart │ Notes
  ───────────────────────┼────────────┼────────────────────┼─────────┼──────────────
  obol model setup       │ Patch      │ POST /model/new    │ Only if │ API key
   (provider + models)   │ (persist)  │ (hot-add all pods) │ API key │ needs envFrom
                         │            │                    │ changed │ reload
  ───────────────────────┼────────────┼────────────────────┼─────────┼──────────────
  obol model setup       │ Patch      │ POST /model/new    │ Fallback│ Validates
   custom                │ (persist)  │ (hot-add all pods) │ only    │ endpoint first
  ───────────────────────┼────────────┼────────────────────┼─────────┼──────────────
  obol model remove      │ Patch      │ GET /model/info    │ NEVER   │ Was: always
                         │ (persist)  │ POST /model/delete │         │ restart. Now:
                         │            │ (hot-delete)       │         │ instant.
  ───────────────────────┼────────────┼────────────────────┼─────────┼──────────────
  obol stack up          │ Patch      │ N/A                │ Yes     │ Pods are fresh,
   (autoConfigureLLM)    │ (persist)  │                    │ (safe)  │ restart is free
  ───────────────────────┼────────────┼────────────────────┼─────────┼──────────────
  PurchaseRequest create │ N/A (paid/*│ POST /model/new    │ NEVER   │ Wildcard route
   (controller)          │  wildcard) │ POST /admin/reload │         │ already exists
  ───────────────────────┼────────────┼────────────────────┼─────────┼──────────────
  PurchaseRequest delete │ N/A        │ GET /model/info    │ NEVER   │ NEW: was no-op
   (controller)          │            │ POST /model/delete │         │
                         │            │ POST /admin/reload │         │

Code Map

  internal/model/model.go                    internal/serviceoffercontroller/
  ─────────────────────────                  ─────────────────────────────────
  litellmAPIViaExec()       ◄── NEW          purchase_helpers.go:
  hotAddModels()            ◄── REFACTORED     getLiteLLMMasterKey()
  hotDeleteModel()          ◄── NEW            litellmBaseURL()
  RemoveModel()             ◄── NO RESTART     addLiteLLMModelEntry()
  AddCustomEndpoint()       ◄── NO RESTART     removeLiteLLMModelEntry()  ◄── NEW
  ConfigureLiteLLM()        (unchanged)        deleteLiteLLMModel()       ◄── NEW
  PatchLiteLLMProvider()    (unchanged)        triggerBuyerReload()       ◄── NEW
  patchLiteLLMConfig()      (unchanged)        mergeBuyerConfig/Auths()   (unchanged)
  RestartLiteLLM()          (API keys only)    checkBuyerStatus()         (unchanged)
  GetMasterKey()            (unchanged)
                                             purchase.go:
  internal/x402/buyer/proxy.go                 reconcileDeletingPurchase() ◄── WIRED
  ──────────────────────────                   reconcilePurchaseConfigure()◄── +reload
  POST /admin/reload        ◄── NEW
  ReloadCh()                ◄── NEW          cmd/x402-buyer/main.go:
  handleAdminReload()       ◄── NEW            ticker select + ReloadCh() ◄── WIRED

  internal/embed/.../llm.yaml
  ────────────────────────────
  image: ghcr.io/obolnetwork/litellm:sha-fe892e3  ◄── FORK

Test plan

  • TestRemoveLiteLLMModelEntry — mock /model/info/model/delete with correct ID
  • TestRemoveLiteLLMModelEntryNoMatch — no delete when model absent
  • TestRemoveLiteLLMModelEntryServerError — graceful on 500
  • TestTriggerBuyerReload — no panic with no pods
  • TestProxy_AdminReload — 200 + channel signal
  • TestProxy_AdminReloadIdempotent — "already pending" on double-fire
  • All 17 existing model tests pass
  • All 14 existing buyer proxy tests pass
  • Full go test ./... green (29 packages)

bussyjd added 23 commits April 9, 2026 00:48
Two fixes validated with real Base Sepolia x402 payments between
two DGX Spark nodes running Nemotron 120B inference.

1. **CA certificate bundle**: The x402-verifier runs in a distroless
   container with no CA store. TLS verification of the public
   facilitator (facilitator.x402.rs) fails with "x509: certificate
   signed by unknown authority". Fix: `obol sell pricing` now reads
   the host CA bundle and patches it into the `ca-certificates`
   ConfigMap mounted by the verifier.

2. **Missing Description field**: The facilitator rejects verify
   requests that lack a `description` field in PaymentRequirement
   with "invalid_format". Fix: populate Description from the route
   pattern when building the payment requirement.

## Validated testnet flow

### Alice (seller)

```
obolup.sh                    # bootstrap dependencies
obol stack init && obol stack up
obol model setup custom --name nemotron-120b \
  --endpoint http://host.k3d.internal:8000/v1 \
  --model "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4"
obol sell pricing --wallet 0xC0De...97E --chain base-sepolia
obol sell http nemotron \
  --wallet 0xC0De...97E --chain base-sepolia \
  --per-request 0.001 --namespace llm \
  --upstream litellm --port 4000 \
  --health-path /health/readiness \
  --register --register-name "Nemotron 120B on DGX Spark"
obol tunnel restart
```

### Bob (buyer)

```
# 1. Discover
curl $TUNNEL/.well-known/agent-registration.json
# → name: "Nemotron 120B on DGX Spark", x402Support: true

# 2. Probe
curl -X POST $TUNNEL/services/nemotron/v1/chat/completions
# → 402: payTo=0xC0De...97E, amount=1000, network=base-sepolia

# 3. Sign EIP-712 TransferWithAuthorization + pay
python3 bob_buy.py
# → 200: "The meaning of life is to discover and pursue purpose"
```

### On-chain receipts (Base Sepolia)

| Tx | Description |
|----|-------------|
| 0xd769953b...c231ec0 | x402 settlement: Bob→Alice 0.001 USDC via ERC-3009 |

Balance change: Alice +0.001 USDC, Bob -0.001 USDC.
Facilitator: https://facilitator.x402.rs (real public settlement).
Replace the third-party facilitator.x402.rs with the Obol-operated
facilitator at x402.gcp.obol.tech. This gives us control over
uptime, chain support, and monitoring (Grafana dashboards already
deployed in obol-infrastructure).

Introduces DefaultFacilitatorURL constant in internal/x402 and
updates all references: CLI flag default, config loader, standalone
inference gateway, and deployment store.

Companion PR in obol-infrastructure adds Base Sepolia (84532) to
the facilitator's chain config alongside Base Mainnet (8453).
Address #321 — LiteLLM reliability improvements:

1. Hot-add models via /model/new API instead of restarting the
   deployment. ConfigMap still patched for persistence. Restart
   only triggered when API keys change (Secret mount requires it).

2. Scale to 2 replicas with RollingUpdate (maxUnavailable: 0,
   maxSurge: 1) so a new pod is ready before any old pod terminates.

3. PodDisruptionBudget (minAvailable: 1) prevents both replicas
   from being down simultaneously during voluntary disruptions.

4. preStop hook (sleep 10) gives EndpointSlice time to deregister
   the terminating pod before SIGTERM — prevents in-flight request
   drops during rolling updates.

5. Reloader annotation on litellm-secrets — Stakater Reloader
   triggers rolling restart on API key rotation, no manual restart.

6. terminationGracePeriodSeconds: 60 — long inference requests
   (e.g. Nemotron 120B at 30s+) have time to complete.
…issing

The prerequisite check blocked installation entirely when Node.js
was not available, even though Docker could extract the openclaw
binary from the published image. This prevented bootstrap on
minimal servers (e.g. DGX Spark nodes with only Docker + Python).

Changes:
- Prerequisites: only fail if BOTH npm AND docker are missing
- install_openclaw(): try npm first, fall back to Docker image
  extraction (docker create + docker cp) when npm unavailable
Introduces PurchaseRequest CRD and extends the serviceoffer-controller
to reconcile buy-side purchases. This replaces direct ConfigMap writes
from buy.py with a controller-based pattern matching the sell-side.

## New resources

- **PurchaseRequest CRD** (`obol.org/v1alpha1`): declarative intent to
  buy inference from a remote x402-gated endpoint. Lives in the agent's
  namespace.

## Controller reconciliation (4 stages)

1. **Probed** — probe endpoint → 402, validate pricing matches spec
2. **AuthsSigned** — call remote-signer via cluster DNS to sign
   ERC-3009 TransferWithAuthorization vouchers
3. **Configured** — write buyer ConfigMaps in llm namespace with
   optimistic concurrency, restart LiteLLM
4. **Ready** — verify sidecar loaded auths via pod /status endpoint

## Security

- Agent only creates PurchaseRequest CRs (own namespace, no cross-NS)
- Controller has elevated RBAC for ConfigMaps in llm, pods/list
- Remote-signer accessed via cluster DNS (no port-forward)
- Finalizer handles cleanup on delete (remove upstream from config)

## RBAC

- Added PurchaseRequest read/write to serviceoffer-controller ClusterRole
- Added pods/get/list for sidecar status checks

Addresses #329. Companion to the dual-stack integration test.
…rites

Modifies buy.py cmd_buy to create a PurchaseRequest CR in the agent's
own namespace instead of writing ConfigMaps cross-namespace. The
serviceoffer-controller (PR #330) reconciles the CR: probes the
endpoint, signs auths via remote-signer, writes buyer ConfigMaps in
llm namespace, and verifies sidecar readiness.

Changes:
- buy.py: replace steps 5-6 (sign + write ConfigMaps) with
  _create_purchase_request() + _wait_for_purchase_ready()
- Agent RBAC: add PurchaseRequest CRUD to openclaw-monetize-write
  ClusterRole (agent's own namespace only, no cross-NS access)
- Keep steps 1-4 (probe, wallet, balance, count) for user feedback

The agent SA can now create PurchaseRequests but never writes to
ConfigMaps in the llm namespace. All ConfigMap operations are
serialized through the controller with optimistic concurrency.
Three fixes discovered during dual-stack testnet validation:

1. **eRPC URL**: `obol sell register` used `http://localhost/rpc` which
   gets 404 from Traefik (wrong Host header). Changed to
   `http://obol.stack/rpc` which matches the HTTPRoute hostname.

2. **--private-key-file ignored**: When OpenClaw agent is deployed, sell
   register always preferred the remote-signer path and silently ignored
   --private-key-file. Now honours user intent: explicit key file flag
   takes priority over remote-signer auto-detection.

3. **Flow script**: add --allow-writes for Base Sepolia eRPC (needed for
   on-chain tx submission), restart eRPC after config change.

Validated: `obol sell register --chain base-sepolia --private-key-file`
mints ERC-8004 NFT (Agent ID 3826) on Base Sepolia via eRPC.
Update dual-stack test to verify PurchaseRequest CR exists after
the agent runs buy.py. The agent prompt stays the same — buy.py's
interface is unchanged, only the backend (CR instead of ConfigMap).
- Fix getSignerAddress to handle string array format from remote-signer
- Fix flow-11: polling for pod readiness, LISTEN port check, anchored
  sed patterns, auto-fund remote-signer wallet
- Auto-fund Bob's remote-signer with USDC from .env key (shortcut for #331)
- resourceVersion handling for PurchaseRequest 409 Conflict

Known issue: controller's signAuths sends typed-data in a format the
remote-signer doesn't accept (empty signature). Needs investigation
of the remote-signer's /api/v1/sign/<addr>/typed-data API format.
Workaround: buy.py signs locally, controller only needs to copy
auths to buyer ConfigMaps (architectural simplification planned).
…rets)

Architectural simplification: instead of the controller reading a Secret
cross-namespace (security risk), buy.py embeds the pre-signed auths
directly in the PurchaseRequest spec.preSignedAuths field.

Flow:
1. buy.py signs auths locally (remote-signer in same namespace)
2. buy.py creates PurchaseRequest CR with auths in spec
3. Controller reads auths from CR spec (same PurchaseRequest RBAC)
4. Controller writes to buyer ConfigMaps in llm namespace

No cross-namespace Secret read. No general secrets RBAC.
Controller only needs PurchaseRequest read + ConfigMap write in llm.

Validated: test PurchaseRequest with embedded auth →
  Probed=True, AuthsSigned=True (loaded from spec),
  Configured=True (wrote to buyer ConfigMaps).
  Ready pending sidecar reload (ConfigMap propagation delay).
The macOS CA bundle (~290KB) exceeds the 262KB annotation limit
that kubectl apply requires. The previous implementation used
kubectl patch --type=merge which hits the same limit.

Switch to "kubectl create --dry-run=client -o yaml | kubectl replace"
which bypasses the annotation entirely. Add PipeCommands helper to
the kubectl package for this pattern.

Tested: obol sell pricing now populates the ca-certificates ConfigMap
automatically on both macOS (290KB /etc/ssl/cert.pem) and Linux
(220KB /etc/ssl/certs/ca-certificates.crt).
The CA ConfigMap is mounted as a volume. Kubernetes may take 60-120s
to propagate changes to running pods. The verifier needs TLS to work
immediately for the facilitator connection, so trigger a rollout
restart right after populating the CA bundle.

Validated: fresh stack → obol sell pricing → CA auto-populated
(339KB on macOS) → verifier restarted → zero TLS errors.
Replace fragile ConfigMap YAML read-modify-write cycles with HTTP API
calls to our LiteLLM fork (ObolNetwork/litellm) for model management.

Model management (internal/model/):
- Add litellmAPIViaExec() — clean kubectl-exec wrapper that fans out
  API calls to all running litellm pods (replicas:2 consistency)
- Add hotDeleteModel() — live model removal via /model/delete API
- Refactor hotAddModels() — use per-pod fan-out instead of single
  deployment exec with inline wget command construction
- Refactor RemoveModel() — hot-delete via API + ConfigMap patch for
  persistence. No more pod restart for model removal.
- Refactor AddCustomEndpoint() — hot-add via API, falls back to
  restart only on failure

Controller (internal/serviceoffercontroller/):
- Implement removeLiteLLMModelEntry() — was no-op stub, now queries
  /model/info to resolve model_id then calls /model/delete
- Wire into reconcileDeletingPurchase() for PurchaseRequest cleanup
- Add triggerBuyerReload() — POST /admin/reload on sidecar pods
  for immediate config pickup (vs 5-second ticker wait)

Buyer sidecar (internal/x402/buyer/):
- Add POST /admin/reload endpoint — triggers immediate config/auth
  file re-read via buffered channel signal
- Wire ReloadCh() into main ticker goroutine for dual select

Infrastructure:
- Switch LiteLLM image to Obol fork: ghcr.io/obolnetwork/litellm:sha-fe892e3
  (config-only /model/new, /model/delete, /model/update without Postgres)
path = f"/api/v1/namespaces/{ns}/secrets"
try:
_kube_json("POST", path, token, ssl_ctx, secret)
print(f" Stored {len(auths)} auths in Secret {ns}/{secret_name}")
existing = _kube_json("GET", f"{path}/{secret_name}", token, ssl_ctx)
secret["metadata"]["resourceVersion"] = existing["metadata"]["resourceVersion"]
_kube_json("PUT", f"{path}/{secret_name}", token, ssl_ctx, secret)
print(f" Updated Secret {ns}/{secret_name} with {len(auths)} auths")
bussyjd added 3 commits April 10, 2026 01:11
Includes fixes from ObolNetwork/litellm#2:
- P1: stale in-memory config after save_config (sequential write data loss)
- P2: inline ModelInfo imports moved to module-level
- P3: PROXY_ADMIN role check in config-only code paths
Replace `sh -c` + fmt.Sprintf shell command construction with direct
argument passing in litellmAPIViaExec() and hotDeleteModel(). JSON body
or auth tokens containing single quotes would break the shell wrapper.

Now each argument goes as a separate argv element to wget via kubectl
exec, bypassing shell interpretation entirely.

Also document this pattern in the obol-stack-dev skill gotchas section.

Addresses CodeQL finding: "Potentially unsafe quoting" on model.go:292.
First multiplatform build: linux/amd64 + linux/arm64.
Includes all previous fixes (P1 stale config, P2 imports, P3 admin auth).
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented Apr 9, 2026

Superseded by the validated integration branch \ and the \ prerelease cut from it. The release-candidate branch now carries the tested sell → discover → buy → settle path, updated docs/skills, and the final x402/buy-side fixes.

@bussyjd bussyjd closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants