Summary
Replace the cron-driven monetize.py reconciliation loop with a proper Kubernetes controller using controller-runtime. Introduce PaymentRoute and RegistrationRequest child CRDs to eliminate the shared ConfigMap mutation race and isolate on-chain side effects. Keep the x402-verifier as a separate Deployment but switch its config source from file polling to a PaymentRoute informer.
Problem
The current reconciliation model has fundamental coupling and correctness issues:
1. Coupled to obol-agent runtime
The reconciler runs as a Python skill script inside the OpenClaw agent pod (monetize.py). If the agent crashes, restarts, or is undeployed, all ServiceOffer reconciliation stops. Payment routes, HTTPRoutes, and registrations stop converging. This was a deliberate design choice documented in monetisation-architecture-proposal.md#L54 — the cron-based approach was chosen over a Go operator for simplicity — but it has become the wrong tradeoff as the system grows.
2. ConfigMap mutation race
_add_pricing_route() (monetize.py#L699) reads the x402-pricing ConfigMap, appends a route entry via string manipulation, and writes the whole ConfigMap back. Two ServiceOffers reconciling simultaneously can overwrite each other's entries. _remove_pricing_route() (monetize.py#L1705) has the same problem in reverse.
3. Polling latency
The reconciler polls every 10-60 seconds. A new ServiceOffer CR sits idle until the next poll cycle. The ConfigMap watcher in the verifier (watcher.go#L16) adds another 60-120s kubelet sync delay. Total worst-case: ~180 seconds from CR creation to traffic flowing.
4. Imperative stage chain
The reconcile function (monetize.py#L1504) runs 6 stages sequentially. Each stage blocks on the previous. If Stage 3 fails, Stages 4-6 never run, even if they're independent. There's no self-healing — if an HTTPRoute is deleted externally, the reconciler won't recreate it until the ServiceOffer is modified.
5. No finalizer-based cleanup
Deletion cleanup is in the CLI path (monetize.py#L1690), not in the controller. If the CR is deleted directly via kubectl delete, external side effects (pricing routes, ERC-8004 registration) are orphaned.
6. Mixed concerns in verifier
The .well-known/agent-registration.json endpoint is served by the x402-verifier (verifier.go#L192). This is discovery metadata, not payment gating — it doesn't belong in the ForwardAuth service.
Proposed Architecture
Guiding Principles
- Derive and observe, don't pipeline. The reconciler computes desired child resources from
spec and applies them with server-side apply. No stage ordering.
- Consolidate code, not runtime. One repo, one internal package set, optionally one image, but separate Deployments for controller and verifier.
- Separate control plane from data plane. The controller writes desired state. The verifier reads
PaymentRoute and serves traffic. Different scaling axes, different RBAC, different failure domains.
Component Layout
┌─────────────────────────────────────────────────────────────────────────┐
│ obol-system namespace │
│ │
│ ┌──────────────────────────────┐ ┌───────────────────────────────┐ │
│ │ serviceoffer-controller │ │ x402-verifier (unchanged ns) │ │
│ │ Deployment (1 replica) │ │ Deployment (N replicas) │ │
│ │ │ │ │ │
│ │ - Leader election │ │ - No leader needed │ │
│ │ - Broad RBAC (write CRDs, │ │ - Read-only RBAC │ │
│ │ HTTPRoutes, Middlewares) │ │ (watch PaymentRoute) │ │
│ │ - Reconciles ServiceOffer │ │ - ForwardAuth on :8443 │ │
│ │ - Creates child resources: │ │ - Builds local route table │ │
│ │ - PaymentRoute │ │ from PaymentRoute informer │ │
│ │ - HTTPRoute │ │ - Calls facilitator │ │
│ │ - Middleware │ │ - Exposes /metrics │ │
│ │ - RegistrationRequest │ │ │ │
│ │ - Manages finalizers │ │ Scales on: request QPS │ │
│ │ │ │ Failure: drops ForwardAuth │ │
│ │ Scales on: CR count │ │ (user-visible) │ │
│ │ Failure: stops convergence │ │ │ │
│ │ (not user-visible) │ └───────────────────────────────┘ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Why Two Deployments
| Concern |
Controller |
Verifier |
| Scaling axis |
Reconcile work (CR count) |
Request QPS |
| Replication |
Single leader |
All replicas active |
| RBAC |
Broad write (CRDs, HTTPRoutes, Middlewares) |
Read-only (PaymentRoute watch) |
| Failure impact |
Stops convergence of new offers |
Stops all paid requests |
| Restart cost |
Re-list + reconcile (seconds) |
Drops in-flight ForwardAuth (user-visible) |
New CRDs
PaymentRoute (owned by ServiceOffer)
Replaces the shared x402-pricing ConfigMap. One CR per monetized route. The verifier watches these via informer instead of polling a file.
apiVersion: obol.org/v1alpha1
kind: PaymentRoute
metadata:
name: myapi-payment
namespace: x402
ownerReferences:
- apiVersion: obol.org/v1alpha1
kind: ServiceOffer
name: myapi
uid: <service-offer-uid>
spec:
pattern: "/services/myapi/*"
price: "10000" # atomic USDC units
payTo: "0x..." # seller wallet
network: "eip155:84532" # CAIP-2
facilitatorURL: "https://..."
priceModel: "per-request" # per-request | per-mtok
perMTok: "10000000" # original per-mtok if applicable
approxTokensPerRequest: 1000
description: "My API service"
status:
admitted: false # set by verifier when route is loaded
lastAdmittedGeneration: 0
Why a CRD instead of ConfigMap:
- Eliminates read-modify-write race (each ServiceOffer owns its own PaymentRoute)
- Event-driven propagation (informer watch, sub-second vs 60-120s ConfigMap sync)
- OwnerReferences enable automatic GC on ServiceOffer deletion
- Status field lets the controller observe whether the verifier has loaded the route
RegistrationRequest (owned by ServiceOffer)
Isolates the ERC-8004 on-chain transaction from the main reconcile loop. The controller creates the request; a registrar Job or controller executes it.
apiVersion: obol.org/v1alpha1
kind: RegistrationRequest
metadata:
name: myapi-registration
namespace: openclaw-obol-agent
ownerReferences:
- apiVersion: obol.org/v1alpha1
kind: ServiceOffer
name: myapi
uid: <service-offer-uid>
spec:
name: "myapi"
description: "My API service"
endpoint: "https://tunnel.example.com/services/myapi"
privateKeySecret:
name: remote-signer-key
key: keystore.json
chain: "base-sepolia"
registry: "0xEA0fE4FCF9E3017a24d9Db6e0e39B552c8648B9D"
status:
phase: Pending | Submitted | Confirmed | Failed | OffChainOnly
agentId: "42"
txHash: "0x..."
errorMessage: ""
Why a separate resource:
- On-chain transactions are slow (seconds to minutes), expensive, and can fail for external reasons (no gas, RPC down)
- The main reconcile loop should never block on a transaction
- Retries and gas estimation are registration-specific concerns
OffChainOnly is a valid terminal state (not a failure), cleanly modeled in status
Reconciliation Flow
sequenceDiagram
participant Op as Operator
participant CLI as obol sell http
participant K8s as Kubernetes API
participant Ctrl as ServiceOffer Controller
participant Verifier as x402-verifier
participant Traefik
participant Chain as Base L2
Op->>CLI: obol sell http myapi --wallet 0x... --price 0.001
CLI->>CLI: Validate upstream reachable, model ready (precondition)
CLI->>K8s: Create ServiceOffer CR
K8s-->>Ctrl: Informer event (Added)
Ctrl->>Ctrl: Add finalizer, set conditions to Unknown
Ctrl->>K8s: SSA Middleware (traefik.io ForwardAuth)
Ctrl->>K8s: SSA PaymentRoute CR
Ctrl->>K8s: SSA HTTPRoute (/services/myapi/*)
K8s-->>Verifier: Informer event (PaymentRoute Added)
Verifier->>Verifier: Build route table entry
Verifier->>K8s: Patch PaymentRoute status.admitted=true
K8s-->>Ctrl: Informer event (PaymentRoute updated)
Ctrl->>Ctrl: Observe: PaymentRoute admitted, HTTPRoute accepted by Gateway
Ctrl->>K8s: Create RegistrationRequest CR
Note over Chain: Registrar Job/controller executes
Chain-->>K8s: RegistrationRequest status: Confirmed, agentId=42
K8s-->>Ctrl: Informer event (RegistrationRequest updated)
Ctrl->>K8s: Set ServiceOffer status: Ready=True, observedGeneration=N
Note over Traefik: /services/myapi/* → ForwardAuth → upstream
Op->>K8s: Delete ServiceOffer CR
K8s-->>Ctrl: Informer event (deletionTimestamp set)
Ctrl->>Chain: Deactivate/tombstone ERC-8004 registration
Ctrl->>K8s: Remove finalizer → GC cascades to PaymentRoute, HTTPRoute, Middleware
Controller Design
Generation-driven, not stage-driven. The reconciler always recomputes desired child resources from spec and applies them with server-side apply. No ordered stages.
func (r *ServiceOfferReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
var offer obolv1alpha1.ServiceOffer
if err := r.Get(ctx, req.NamespacedName, &offer); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Finalizer
if !offer.DeletionTimestamp.IsZero() {
return r.handleDeletion(ctx, &offer)
}
if !controllerutil.ContainsFinalizer(&offer, finalizerName) {
controllerutil.AddFinalizer(&offer, finalizerName)
return ctrl.Result{}, r.Update(ctx, &offer)
}
// Derive and apply desired child resources (all idempotent via SSA)
middleware := r.desiredMiddleware(&offer)
paymentRoute := r.desiredPaymentRoute(&offer)
httpRoute := r.desiredHTTPRoute(&offer)
for _, obj := range []client.Object{middleware, paymentRoute, httpRoute} {
if err := r.Patch(ctx, obj, client.Apply, fieldOwner); err != nil {
return ctrl.Result{}, err
}
}
// Observe child resource status
conditions := []metav1.Condition{
r.computeUpstreamHealthy(ctx, &offer),
r.computePaymentGateReady(ctx, &offer, paymentRoute),
r.computeRoutePublished(ctx, &offer, httpRoute),
r.computeRegistered(ctx, &offer),
}
// RegistrationRequest — only create when prerequisites are met
if allTrue(conditions[:3]) && offer.Spec.Registration.Enabled {
regReq := r.desiredRegistrationRequest(&offer)
if err := r.Patch(ctx, regReq, client.Apply, fieldOwner); err != nil {
return ctrl.Result{}, err
}
}
// Update status
offer.Status.ObservedGeneration = offer.Generation
offer.Status.Conditions = conditions
offer.Status.Phase = computePhase(conditions)
return ctrl.Result{}, r.Status().Update(ctx, &offer)
}
Key properties:
observedGeneration distinguishes "hasn't seen this spec" from "tried and failed"
Ready = observedGeneration == generation AND all required conditions true
- Child resources are owned → deletion cascades automatically (plus finalizer for external side effects)
- No blocking work in reconcile — registration is a child resource observed asynchronously
- HTTPRoute readiness comes from Gateway status conditions, not "I created the object"
Verifier Changes
The verifier (cmd/x402-verifier) stays as a separate binary and Deployment. Changes:
- Replace ConfigMap file watcher (
watcher.go) with a PaymentRoute informer
- Build in-memory route table from
PaymentRoute CRs instead of parsing YAML
- Write
status.admitted on each PaymentRoute when loaded (feedback to controller)
- Remove
.well-known handler — move to controller or dedicated httpd
// internal/x402/source/kube/informer.go
func NewPaymentRouteSource(client client.Client) *PaymentRouteSource {
// Watches PaymentRoute CRs, builds local route table
// sync.RWMutex protects reads (ForwardAuth) from writes (informer events)
}
The ForwardAuth handler itself is unchanged — it still matches request paths against routes and calls the facilitator. Only the config source changes.
Shared Packages
One binary family, two commands:
cmd/
serviceoffer-controller/main.go # controller-runtime manager
x402-verifier/main.go # ForwardAuth HTTP server (existing, modified)
internal/
paymentroute/ # PaymentRoute CRD types + deepcopy
api/v1alpha1/types.go
api/v1alpha1/zz_generated.deepcopy.go
registrationrequest/ # RegistrationRequest CRD types
api/v1alpha1/types.go
controller/ # Reconciler implementation
serviceoffer_controller.go
serviceoffer_controller_test.go
x402/
source/kube/ # PaymentRoute informer (used by verifier)
informer.go
runtime/ # ForwardAuth handler (existing, refactored)
handler.go
translate/ # Route matching logic (existing)
matcher.go
Migration Path
Phase 1: Controller + finalizers (no new CRDs)
- Implement
ServiceOfferController in Go with controller-runtime
- Keep writing to
x402-pricing ConfigMap (same as monetize.py does today)
- Keep verifier unchanged (still reads ConfigMap)
- Deploy controller as own Deployment
- Keep
monetize.py as fallback, gated behind a feature flag
- Value: Deterministic reconciliation, independent of agent, idempotent, finalizer cleanup
Phase 2: PaymentRoute CRD
- Define
PaymentRoute CRD
- Controller creates
PaymentRoute CRs instead of mutating ConfigMap
- Verifier switches from file watcher to PaymentRoute informer
- Remove
x402-pricing ConfigMap from the data path
- Value: Eliminates ConfigMap race, sub-second propagation, correct deletion
Phase 3: RegistrationRequest CRD
- Define
RegistrationRequest CRD
- Controller creates
RegistrationRequest instead of calling ERC-8004 directly
- Registrar Job/controller handles on-chain transaction
- Value: Non-blocking registration, clean retry semantics, OffChainOnly as valid state
Phase 4: Cleanup
- Remove
monetize.py entirely
- Remove
.well-known handler from verifier
- Remove
x402-pricing ConfigMap template from infrastructure
- Update CLAUDE.md and docs/specs/
What Gets Deleted
| File |
Lines |
Reason |
internal/embed/skills/sell/scripts/monetize.py |
~1700 |
Replaced by Go controller |
internal/x402/watcher.go |
58 |
Replaced by PaymentRoute informer |
x402-pricing ConfigMap template |
~30 |
Replaced by PaymentRoute CRD |
.well-known handler in verifier.go |
~20 |
Moved to controller/httpd |
ServiceOffer CRD Status Changes
Current:
status:
phase: "Ready" # single string
Proposed:
status:
phase: "Ready"
observedGeneration: 3
conditions:
- type: UpstreamHealthy
status: "True"
lastTransitionTime: "2026-03-29T10:00:00Z"
reason: HealthCheckPassed
message: "GET /health returned 200"
- type: PaymentGateReady
status: "True"
lastTransitionTime: "2026-03-29T10:00:01Z"
reason: PaymentRouteAdmitted
message: "PaymentRoute myapi-payment admitted by verifier"
- type: RoutePublished
status: "True"
lastTransitionTime: "2026-03-29T10:00:01Z"
reason: HTTPRouteAccepted
message: "HTTPRoute accepted by Gateway traefik-gateway"
- type: Registered
status: "True"
lastTransitionTime: "2026-03-29T10:00:15Z"
reason: OnChainConfirmed
message: "ERC-8004 agentId=42, tx=0xabc..."
Acceptance Criteria
obol sell http creates a ServiceOffer CR and the controller converges it to Ready without the obol-agent pod running
- Deleting a ServiceOffer via
kubectl delete cleans up all child resources including pricing routes (finalizer)
- Two concurrent ServiceOffers never corrupt each other's pricing (no shared ConfigMap mutation)
- Route propagation from CR creation to ForwardAuth active is under 5 seconds (not 60-180s)
- Controller restart does not interrupt in-flight ForwardAuth requests on the verifier
obol sell status shows per-condition status (not just a phase string)
- ERC-8004 registration failure does not block the service from being Ready for traffic (OffChainOnly is valid)
- All reconcile state transitions are testable in Go without a running cluster (envtest)
Test Plan
- Unit tests: Reconcile function with fake client — test each condition computation, finalizer logic, SSA patch generation
- envtest integration: Real API server, no kubelet — test full reconcile cycle, deletion cascade, concurrent ServiceOffers
- E2E:
obol sell http → verify PaymentRoute admitted → verify 402 response → verify deletion cleanup
- Chaos: Kill controller pod during reconciliation → verify convergence on restart
- Migration: Run monetize.py and controller side-by-side, verify identical outcomes
Summary
Replace the cron-driven
monetize.pyreconciliation loop with a proper Kubernetes controller usingcontroller-runtime. IntroducePaymentRouteandRegistrationRequestchild CRDs to eliminate the shared ConfigMap mutation race and isolate on-chain side effects. Keep the x402-verifier as a separate Deployment but switch its config source from file polling to aPaymentRouteinformer.Problem
The current reconciliation model has fundamental coupling and correctness issues:
1. Coupled to obol-agent runtime
The reconciler runs as a Python skill script inside the OpenClaw agent pod (
monetize.py). If the agent crashes, restarts, or is undeployed, all ServiceOffer reconciliation stops. Payment routes, HTTPRoutes, and registrations stop converging. This was a deliberate design choice documented in monetisation-architecture-proposal.md#L54 — the cron-based approach was chosen over a Go operator for simplicity — but it has become the wrong tradeoff as the system grows.2. ConfigMap mutation race
_add_pricing_route()(monetize.py#L699) reads thex402-pricingConfigMap, appends a route entry via string manipulation, and writes the whole ConfigMap back. Two ServiceOffers reconciling simultaneously can overwrite each other's entries._remove_pricing_route()(monetize.py#L1705) has the same problem in reverse.3. Polling latency
The reconciler polls every 10-60 seconds. A new ServiceOffer CR sits idle until the next poll cycle. The ConfigMap watcher in the verifier (watcher.go#L16) adds another 60-120s kubelet sync delay. Total worst-case: ~180 seconds from CR creation to traffic flowing.
4. Imperative stage chain
The reconcile function (monetize.py#L1504) runs 6 stages sequentially. Each stage blocks on the previous. If Stage 3 fails, Stages 4-6 never run, even if they're independent. There's no self-healing — if an HTTPRoute is deleted externally, the reconciler won't recreate it until the ServiceOffer is modified.
5. No finalizer-based cleanup
Deletion cleanup is in the CLI path (monetize.py#L1690), not in the controller. If the CR is deleted directly via
kubectl delete, external side effects (pricing routes, ERC-8004 registration) are orphaned.6. Mixed concerns in verifier
The
.well-known/agent-registration.jsonendpoint is served by the x402-verifier (verifier.go#L192). This is discovery metadata, not payment gating — it doesn't belong in the ForwardAuth service.Proposed Architecture
Guiding Principles
specand applies them with server-side apply. No stage ordering.PaymentRouteand serves traffic. Different scaling axes, different RBAC, different failure domains.Component Layout
Why Two Deployments
New CRDs
PaymentRoute(owned by ServiceOffer)Replaces the shared
x402-pricingConfigMap. One CR per monetized route. The verifier watches these via informer instead of polling a file.Why a CRD instead of ConfigMap:
RegistrationRequest(owned by ServiceOffer)Isolates the ERC-8004 on-chain transaction from the main reconcile loop. The controller creates the request; a registrar Job or controller executes it.
Why a separate resource:
OffChainOnlyis a valid terminal state (not a failure), cleanly modeled in statusReconciliation Flow
Controller Design
Generation-driven, not stage-driven. The reconciler always recomputes desired child resources from
specand applies them with server-side apply. No ordered stages.Key properties:
observedGenerationdistinguishes "hasn't seen this spec" from "tried and failed"Ready=observedGeneration == generationAND all required conditions trueVerifier Changes
The verifier (
cmd/x402-verifier) stays as a separate binary and Deployment. Changes:watcher.go) with aPaymentRouteinformerPaymentRouteCRs instead of parsing YAMLstatus.admittedon each PaymentRoute when loaded (feedback to controller).well-knownhandler — move to controller or dedicated httpdThe ForwardAuth handler itself is unchanged — it still matches request paths against routes and calls the facilitator. Only the config source changes.
Shared Packages
One binary family, two commands:
Migration Path
Phase 1: Controller + finalizers (no new CRDs)
ServiceOfferControllerin Go with controller-runtimex402-pricingConfigMap (same as monetize.py does today)monetize.pyas fallback, gated behind a feature flagPhase 2: PaymentRoute CRD
PaymentRouteCRDPaymentRouteCRs instead of mutating ConfigMapx402-pricingConfigMap from the data pathPhase 3: RegistrationRequest CRD
RegistrationRequestCRDRegistrationRequestinstead of calling ERC-8004 directlyPhase 4: Cleanup
monetize.pyentirely.well-knownhandler from verifierx402-pricingConfigMap template from infrastructureWhat Gets Deleted
internal/embed/skills/sell/scripts/monetize.pyinternal/x402/watcher.gox402-pricingConfigMap template.well-knownhandler in verifier.goServiceOffer CRD Status Changes
Current:
Proposed:
Acceptance Criteria
obol sell httpcreates a ServiceOffer CR and the controller converges it to Ready without the obol-agent pod runningkubectl deletecleans up all child resources including pricing routes (finalizer)obol sell statusshows per-condition status (not just a phase string)Test Plan
obol sell http→ verify PaymentRoute admitted → verify 402 response → verify deletion cleanup