release: merge v0.8.0-rc3 integration branch to main#335
release: merge v0.8.0-rc3 integration branch to main#335
Conversation
Phase 1 of #326: migrate from mark3labs/x402-go v0.13.0. Add coinbase/x402/go (pseudo-version, v2 SDK with v1 compat) and create local shims for functionality the v2 SDK doesn't provide: - chains.go: ChainInfo type, ResolveChainInfo(), BuildV1Requirement() with all existing chains + new Arbitrum One/Sepolia support - forwardauth.go: ForwardAuth middleware with VerifyOnly support and settlementInterceptor (settle only on downstream success) - buyer/types.go: Signer interface, PaymentSelector, PaymentEvent, error types (internal to buyer transport, not on the wire) - buyer/encoding.go: EncodePayment/DecodeSettlement (base64+JSON) New tests cover: - All chain name resolutions and USDC address validation - ForwardAuth: no payment → 402, valid payment + VerifyOnly, invalid payment → 402, settle on success, no settle on handler error, UpstreamAuth propagation, no-UpstreamAuth case - Encoding round-trips and error cases All existing tests continue to pass.
Phase 2 of #326: replace every mark3labs/x402-go import. Production files: - config.go: delete ResolveChain() + EthereumMainnet (now in chains.go) - verifier.go: use BuildV1Requirement() + NewForwardAuthMiddleware() - buyer/signer.go: use coinbase/x402/go/types.PaymentPayloadV1 with map[string]interface{} payload instead of typed EVMPayload - buyer/proxy.go: use local Signer/PaymentSelector/PaymentEvent types + local EncodePayment/DecodeSettlement, handle Extra as *json.RawMessage - inference/gateway.go: use ChainInfo + ForwardAuthMiddleware - cmd/obol/sell.go: delete resolveX402Chain(), delegate to ResolveChainInfo() Test files updated for new types (map payload assertions, local interfaces). Zero mark3labs/x402-go imports remain in Go source (one comment reference). All tests pass.
Phase 3 of #326: go mod tidy removes the legacy v1 SDK. mark3labs/x402-go v0.13.0 is no longer imported anywhere. The stack now uses coinbase/x402/go (v2 SDK with v1 types) for wire types and a thin local ForwardAuth shim for the seller middleware. Update CLAUDE.md deps reference.
Two fixes validated with real Base Sepolia x402 payments between two DGX Spark nodes running Nemotron 120B inference. 1. **CA certificate bundle**: The x402-verifier runs in a distroless container with no CA store. TLS verification of the public facilitator (facilitator.x402.rs) fails with "x509: certificate signed by unknown authority". Fix: `obol sell pricing` now reads the host CA bundle and patches it into the `ca-certificates` ConfigMap mounted by the verifier. 2. **Missing Description field**: The facilitator rejects verify requests that lack a `description` field in PaymentRequirement with "invalid_format". Fix: populate Description from the route pattern when building the payment requirement. ## Validated testnet flow ### Alice (seller) ``` obolup.sh # bootstrap dependencies obol stack init && obol stack up obol model setup custom --name nemotron-120b \ --endpoint http://host.k3d.internal:8000/v1 \ --model "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4" obol sell pricing --wallet 0xC0De...97E --chain base-sepolia obol sell http nemotron \ --wallet 0xC0De...97E --chain base-sepolia \ --per-request 0.001 --namespace llm \ --upstream litellm --port 4000 \ --health-path /health/readiness \ --register --register-name "Nemotron 120B on DGX Spark" obol tunnel restart ``` ### Bob (buyer) ``` # 1. Discover curl $TUNNEL/.well-known/agent-registration.json # → name: "Nemotron 120B on DGX Spark", x402Support: true # 2. Probe curl -X POST $TUNNEL/services/nemotron/v1/chat/completions # → 402: payTo=0xC0De...97E, amount=1000, network=base-sepolia # 3. Sign EIP-712 TransferWithAuthorization + pay python3 bob_buy.py # → 200: "The meaning of life is to discover and pursue purpose" ``` ### On-chain receipts (Base Sepolia) | Tx | Description | |----|-------------| | 0xd769953b...c231ec0 | x402 settlement: Bob→Alice 0.001 USDC via ERC-3009 | Balance change: Alice +0.001 USDC, Bob -0.001 USDC. Facilitator: https://facilitator.x402.rs (real public settlement).
Replace the third-party facilitator.x402.rs with the Obol-operated facilitator at x402.gcp.obol.tech. This gives us control over uptime, chain support, and monitoring (Grafana dashboards already deployed in obol-infrastructure). Introduces DefaultFacilitatorURL constant in internal/x402 and updates all references: CLI flag default, config loader, standalone inference gateway, and deployment store. Companion PR in obol-infrastructure adds Base Sepolia (84532) to the facilitator's chain config alongside Base Mainnet (8453).
Address #321 — LiteLLM reliability improvements: 1. Hot-add models via /model/new API instead of restarting the deployment. ConfigMap still patched for persistence. Restart only triggered when API keys change (Secret mount requires it). 2. Scale to 2 replicas with RollingUpdate (maxUnavailable: 0, maxSurge: 1) so a new pod is ready before any old pod terminates. 3. PodDisruptionBudget (minAvailable: 1) prevents both replicas from being down simultaneously during voluntary disruptions. 4. preStop hook (sleep 10) gives EndpointSlice time to deregister the terminating pod before SIGTERM — prevents in-flight request drops during rolling updates. 5. Reloader annotation on litellm-secrets — Stakater Reloader triggers rolling restart on API key rotation, no manual restart. 6. terminationGracePeriodSeconds: 60 — long inference requests (e.g. Nemotron 120B at 30s+) have time to complete.
…issing The prerequisite check blocked installation entirely when Node.js was not available, even though Docker could extract the openclaw binary from the published image. This prevented bootstrap on minimal servers (e.g. DGX Spark nodes with only Docker + Python). Changes: - Prerequisites: only fail if BOTH npm AND docker are missing - install_openclaw(): try npm first, fall back to Docker image extraction (docker create + docker cp) when npm unavailable
Introduces PurchaseRequest CRD and extends the serviceoffer-controller to reconcile buy-side purchases. This replaces direct ConfigMap writes from buy.py with a controller-based pattern matching the sell-side. ## New resources - **PurchaseRequest CRD** (`obol.org/v1alpha1`): declarative intent to buy inference from a remote x402-gated endpoint. Lives in the agent's namespace. ## Controller reconciliation (4 stages) 1. **Probed** — probe endpoint → 402, validate pricing matches spec 2. **AuthsSigned** — call remote-signer via cluster DNS to sign ERC-3009 TransferWithAuthorization vouchers 3. **Configured** — write buyer ConfigMaps in llm namespace with optimistic concurrency, restart LiteLLM 4. **Ready** — verify sidecar loaded auths via pod /status endpoint ## Security - Agent only creates PurchaseRequest CRs (own namespace, no cross-NS) - Controller has elevated RBAC for ConfigMaps in llm, pods/list - Remote-signer accessed via cluster DNS (no port-forward) - Finalizer handles cleanup on delete (remove upstream from config) ## RBAC - Added PurchaseRequest read/write to serviceoffer-controller ClusterRole - Added pods/get/list for sidecar status checks Addresses #329. Companion to the dual-stack integration test.
…rites Modifies buy.py cmd_buy to create a PurchaseRequest CR in the agent's own namespace instead of writing ConfigMaps cross-namespace. The serviceoffer-controller (PR #330) reconciles the CR: probes the endpoint, signs auths via remote-signer, writes buyer ConfigMaps in llm namespace, and verifies sidecar readiness. Changes: - buy.py: replace steps 5-6 (sign + write ConfigMaps) with _create_purchase_request() + _wait_for_purchase_ready() - Agent RBAC: add PurchaseRequest CRUD to openclaw-monetize-write ClusterRole (agent's own namespace only, no cross-NS access) - Keep steps 1-4 (probe, wallet, balance, count) for user feedback The agent SA can now create PurchaseRequests but never writes to ConfigMaps in the llm namespace. All ConfigMap operations are serialized through the controller with optimistic concurrency.
Three fixes discovered during dual-stack testnet validation: 1. **eRPC URL**: `obol sell register` used `http://localhost/rpc` which gets 404 from Traefik (wrong Host header). Changed to `http://obol.stack/rpc` which matches the HTTPRoute hostname. 2. **--private-key-file ignored**: When OpenClaw agent is deployed, sell register always preferred the remote-signer path and silently ignored --private-key-file. Now honours user intent: explicit key file flag takes priority over remote-signer auto-detection. 3. **Flow script**: add --allow-writes for Base Sepolia eRPC (needed for on-chain tx submission), restart eRPC after config change. Validated: `obol sell register --chain base-sepolia --private-key-file` mints ERC-8004 NFT (Agent ID 3826) on Base Sepolia via eRPC.
Update dual-stack test to verify PurchaseRequest CR exists after the agent runs buy.py. The agent prompt stays the same — buy.py's interface is unchanged, only the backend (CR instead of ConfigMap).
- Fix getSignerAddress to handle string array format from remote-signer - Fix flow-11: polling for pod readiness, LISTEN port check, anchored sed patterns, auto-fund remote-signer wallet - Auto-fund Bob's remote-signer with USDC from .env key (shortcut for #331) - resourceVersion handling for PurchaseRequest 409 Conflict Known issue: controller's signAuths sends typed-data in a format the remote-signer doesn't accept (empty signature). Needs investigation of the remote-signer's /api/v1/sign/<addr>/typed-data API format. Workaround: buy.py signs locally, controller only needs to copy auths to buyer ConfigMaps (architectural simplification planned).
…rets) Architectural simplification: instead of the controller reading a Secret cross-namespace (security risk), buy.py embeds the pre-signed auths directly in the PurchaseRequest spec.preSignedAuths field. Flow: 1. buy.py signs auths locally (remote-signer in same namespace) 2. buy.py creates PurchaseRequest CR with auths in spec 3. Controller reads auths from CR spec (same PurchaseRequest RBAC) 4. Controller writes to buyer ConfigMaps in llm namespace No cross-namespace Secret read. No general secrets RBAC. Controller only needs PurchaseRequest read + ConfigMap write in llm. Validated: test PurchaseRequest with embedded auth → Probed=True, AuthsSigned=True (loaded from spec), Configured=True (wrote to buyer ConfigMaps). Ready pending sidecar reload (ConfigMap propagation delay).
…implify agent response validation
The macOS CA bundle (~290KB) exceeds the 262KB annotation limit that kubectl apply requires. The previous implementation used kubectl patch --type=merge which hits the same limit. Switch to "kubectl create --dry-run=client -o yaml | kubectl replace" which bypasses the annotation entirely. Add PipeCommands helper to the kubectl package for this pattern. Tested: obol sell pricing now populates the ca-certificates ConfigMap automatically on both macOS (290KB /etc/ssl/cert.pem) and Linux (220KB /etc/ssl/certs/ca-certificates.crt).
The CA ConfigMap is mounted as a volume. Kubernetes may take 60-120s to propagate changes to running pods. The verifier needs TLS to work immediately for the facilitator connection, so trigger a rollout restart right after populating the CA bundle. Validated: fresh stack → obol sell pricing → CA auto-populated (339KB on macOS) → verifier restarted → zero TLS errors.
Replace fragile ConfigMap YAML read-modify-write cycles with HTTP API calls to our LiteLLM fork (ObolNetwork/litellm) for model management. Model management (internal/model/): - Add litellmAPIViaExec() — clean kubectl-exec wrapper that fans out API calls to all running litellm pods (replicas:2 consistency) - Add hotDeleteModel() — live model removal via /model/delete API - Refactor hotAddModels() — use per-pod fan-out instead of single deployment exec with inline wget command construction - Refactor RemoveModel() — hot-delete via API + ConfigMap patch for persistence. No more pod restart for model removal. - Refactor AddCustomEndpoint() — hot-add via API, falls back to restart only on failure Controller (internal/serviceoffercontroller/): - Implement removeLiteLLMModelEntry() — was no-op stub, now queries /model/info to resolve model_id then calls /model/delete - Wire into reconcileDeletingPurchase() for PurchaseRequest cleanup - Add triggerBuyerReload() — POST /admin/reload on sidecar pods for immediate config pickup (vs 5-second ticker wait) Buyer sidecar (internal/x402/buyer/): - Add POST /admin/reload endpoint — triggers immediate config/auth file re-read via buffered channel signal - Wire ReloadCh() into main ticker goroutine for dual select Infrastructure: - Switch LiteLLM image to Obol fork: ghcr.io/obolnetwork/litellm:sha-fe892e3 (config-only /model/new, /model/delete, /model/update without Postgres)
Includes fixes from ObolNetwork/litellm#2: - P1: stale in-memory config after save_config (sequential write data loss) - P2: inline ModelInfo imports moved to module-level - P3: PROXY_ADMIN role check in config-only code paths
Replace `sh -c` + fmt.Sprintf shell command construction with direct argument passing in litellmAPIViaExec() and hotDeleteModel(). JSON body or auth tokens containing single quotes would break the shell wrapper. Now each argument goes as a separate argv element to wget via kubectl exec, bypassing shell interpretation entirely. Also document this pattern in the obol-stack-dev skill gotchas section. Addresses CodeQL finding: "Potentially unsafe quoting" on model.go:292.
First multiplatform build: linux/amd64 + linux/arm64. Includes all previous fixes (P1 stale config, P2 imports, P3 admin auth).
… restart Two release-blocking bugs in f57498e "harden buy-side controller boundaries": 1. purchaserequest-crd.yaml was malformed YAML — status.properties was siblinged to status (instead of nested under it) and conditions.items was over-indented. kubectl apply fails with "yaml: line 117: mapping values are not allowed", so every fresh `obol stack up` would fail to install the CRD and the buy-side controller would refuse to start. embed_crd_test.go covered ServiceOffer and RegistrationRequest but not PurchaseRequest, so CI stayed green. 2. addLiteLLMModelEntry / removeLiteLLMModelEntry triggered a rolling restart of the litellm Deployment on every PurchaseRequest. With replicas: 1 (intentional, because x402-buyer consumed-auth state lives on a pod-local emptyDir) this is a correctness bug, not an availability bug: the restart wipes /state/consumed.json, the sidecar then re-offers already-spent ERC-3009 auths from the still-populated buyer ConfigMaps, and every facilitator settle call rejects them as double-spends. Fix: - Re-indent the status subtree in purchaserequest-crd.yaml so every field defined there (observedGeneration, conditions, remaining, spent, …) lands under status.properties where the printer columns expect them. - Add TestPurchaseRequestCRD_Parses which re-reads the file through yaml.Unmarshal, walks every required spec + status field, and resolves every additionalPrinterColumns jsonPath against the schema. Confirmed to fail on the f57498e shape and pass on the fix. - Replace restartLiteLLM with two HTTP helpers that talk to the LiteLLM admin API directly: hotAddLiteLLMModel POSTs /model/new, and hotDeleteLiteLLMModel queries /model/info then POSTs /model/delete per matching id. Both use ctx and c.httpClient, close response bodies, and surface API errors (no silent fallback to restart — that would reintroduce the double-spend). - Delete the now-unused restartLiteLLM function. Nothing else in the codebase calls it. - Add a narrow grant for secrets:get scoped to resourceNames: [litellm-secrets] on the serviceoffer-controller ClusterRole so the controller can read LITELLM_MASTER_KEY. No broader secret access. - Rewrite the purchase_helpers_test suite around an httptest LiteLLM fake that records /model/new, /model/info, /model/delete calls and the Authorization header. The new tests assert no Deployment is ever created during add, and that idempotent adds do not re-hit the API. Verified with `go build ./...` and `go test ./...`. The guardrail was sanity-checked by temporarily reverting the indent fix — the CRD parse test fails immediately with the exact f57498e error, then passes again once restored.
|
RC3 validation update for Validated locally on the human/CLI paths:
On-chain receipt from the Obol facilitator dual-stack run:
Notes:
|
3dcc2a2 to
0f7118b
Compare
Revalidation — 2026-04-10, 41/41 passedRe-ran Result:
|
| Gate | Step | Status | Evidence |
|---|---|---|---|
| Alice stack up | 12 | ✅ | helmfile base release applies cleanly (namespace pre-creation from 0f7118b) |
| Alice ServiceOffer Ready | 17 | ✅ | attempt 2 |
| Alice 402 gate | 19 | ✅ | attempt 1 |
| ERC-8004 registration | 21 | ✅ | Agent ID 4202 |
| Bob stack up | 24 | ✅ | same helmfile path, no drift |
| Bob signer funded | 28 | ✅ | 0xfbbDE4514b75c5Daa958253f300E13A8B268bCF0 + 0.05 USDC |
| Agent discovery via ERC-8004 | 31 | ✅ | attempt 1 |
| Agent buy-inference | 32 | ✅ | PurchaseRequest CR created |
| PurchaseRequest Ready | 33 | ✅ | attempt 1, 66s |
| LiteLLM rollout settled | 34 | ✅ | |
| Buyer sidecar has auths | 35 | ✅ | remaining=5 spent=0 model=paid/qwen3.5:9b |
| Paid inference | 36 | ✅ | HTTP 200 in 13.1s |
| On-chain settlement | 37-38 | ✅ | see below |
| Cleanup | 39-41 | ✅ | both stacks down |
Fresh on-chain receipts (Base Sepolia)
- ERC-8004 Agent ID:
4202 - Settlement tx:
0x629b7cfacde23b99b4e8122c58a5d0f46478c21413887164bda3ca717e7cef0c - Amount:
1000micro-USDC (one paid request at0.001) - Facilitator:
https://x402.gcp.obol.tech(new default, now validated end-to-end) - Alice wallet:
0xC0De030F6C37f490594F93fB99e2756703c4297E - Bob signer:
0xfbbDE4514b75c5Daa958253f300E13A8B268bCF0 - Tunnel:
https://edge-fingers-antivirus-reuters.trycloudflare.com - Total runtime: ~6 min,
stack initto cleanup
Comparison with the parallel stale run
A parallel run against a pre-f57498e binary in the same machine failed step 33 (Bob: PurchaseRequest Ready) — PurchaseRequest CR: ... False 5m12s, timed out after 120s. That run recorded 32/33 passed. Same flow, same facilitator, same machine, same wallets — the only difference was the LiteLLM management path (ConfigMap + rollout restart vs. hot-add via /model/new API). The hardened branch at 0f7118b cleared step 33 on attempt 1 in 66s.
Why the old validation in the PR body was stale
The 41/41 passed in the PR description was recorded at 7fb8fe0 (2026-04-10 13:39:54), before the hardening commit f57498e (14:01:55). The hardening commit introduced two regressions that invalidated the prior validation:
purchaserequest-crd.yamlstatus subtree was mis-indented (status.propertiessiblinged instead of nested;conditions.itemsover-indented) —kubectl applyfails withyaml: line 117: mapping values are not allowed. CI stayed green becauseembed_crd_test.gohad no PurchaseRequest parse test.addLiteLLMModelEntryandremoveLiteLLMModelEntrystill calledrestartLiteLLM, which wipes the x402-buyer sidecar's pod-local/state/consumed.jsonemptyDir. On every purchase the sidecar then re-offered already-spent ERC-3009 auths from the still-populatedbuyer-authsConfigMap, and the facilitator rejected each one as a double-spend.
Both are closed in 16a2d83 (CRD reindent + TestPurchaseRequestCRD_Parses + hot-add/hot-delete via /model/new and /model/info→/model/delete through a narrow secrets:get/litellm-secrets grant on the controller ClusterRole). 0f7118b added the missing openclaw-obol-agent Namespace that the RBAC demotion in f57498e implicitly required.
Remaining non-blocker follow-ups (post-rc3)
From the round-2 reviews, still open but not gating rc3:
autoRefilldead schema — declared in CRD +monetizeapi/types.go, zero readers ininternal/serviceoffercontroller/mergeBuyerConfig/mergeBuyerAuthsstill Get+Update, not SSA (comment inllm.yamlquietly dropped the SSA claim)pendingAuths sync.Mapsurvives — spec is already source of truth, the map is a micro-cache with no clear purpose- No
AddAfterrequeue for PurchaseRequest stages (ServiceOffer has one) - Codex hot-path findings — biggest is
PaymentRequirementsrebuilt per request inverifier.go/chains.go/forwardauth.gowith a freshbig.Float+Extramap +http.Clientallocation - LiteLLM fork
sha-c16b156has no Renovate / upstream-watcher config - Add
TestReconcilePurchase*for the 4-stage state machine — the round-2 critic's Add helm to obolup #2 blocker is still unchanged at the unit-test level
Log: .worktrees/flow11-rc3/.build/logs/flow-11-rc3-20260410-155423.log in the validation worktree.
| if cfg.Chain.NetworkID == "" { | ||
| cfg.Chain = x402.BaseSepolia | ||
| cfg.Chain = x402pkg.ChainBaseSepolia | ||
| } |
There was a problem hiding this comment.
We need to get this working with really money pretty quickly, i'm not so sure if we should roll out a default of fake money if we want this to be a real product
v0.8.0-rc3
This pre-release is validated against the full seller → discover → buy → settle path on Base Sepolia.
Included
PurchaseRequestServiceOfferreconciliation--config-dir/--auths-dirmodeValidation Summary
Validated branch:
feat/x402-buy-side-integrationValidated checks:
go test ./...bash -n obolup.shbash -n flows/flow-11-dual-stack.sh41/41 passedValidated facilitator for Base Sepolia settlement:
https://facilitator.x402.rsOn-Chain Receipts
ERC-8004 registration:
41140x8e26362266612fcb6be3bfa05c0cfccca751d4585d92856570370899b1980ae039994900Buyer signer funding:
0x37f9921847f0e46c8313a805e16aa65d800da62bcb7b95074b7a6fbb504f02ff3999497050000micro-USDCSettlement:
0x73a79fee6499cd1ddbca09b4d0217b98cd18712fad80375ebff1744c646cc8e0399950061000micro-USDCNotes