Skip to content

[docs] Sandbox runtime metering — scoped-resource design#4783

Draft
mmabrouk wants to merge 2 commits into
mainfrom
claude/git-butler-agent-prs-b227dz
Draft

[docs] Sandbox runtime metering — scoped-resource design#4783
mmabrouk wants to merge 2 commits into
mainfrom
claude/git-butler-agent-prs-b227dz

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 20, 2026

Copy link
Copy Markdown
Member

Context

We pay Daytona by the minute for the ephemeral VM that runs an agent, but that runtime never reaches our pricing surface — no meter, no per-plan limit, nothing reported to Stripe. This design adds sandbox wall-time as a first-class billable dimension, modeled as a configurable, scoped resource so it also lays the first rail for "limit usage per project / agent / user."

Documentation only. Adds a design under docs/designs/sandbox-runtime-metering/ (proposal, research, tasks). No code or schema changes.

The model

A resource is a named, scoped entitlement with a quota — exactly what the EE check_entitlements / Quota / Scope / meters layer already provides. A request declares the resource it's about to consume; the system checks entitlement at the same cached point it already checks auth, and only then runs; the consumed minutes are booked after. Shipped project-scoped by default (Scope.PROJECT), with USER available today and a new AGENT scope as an explicit phase 2.

What changed since the first draft (and why)

The first draft pulled runtime out of the OTel trace pipeline. We then explored "tag sandboxes, let a cron pull usage from Daytona, never touch the run path." Verifying Daytona's API killed that approach for our workload:

  • No per-sandbox usage/cost API — CPU-seconds / GB-seconds / price are dashboard-only.
  • The one /organizations/:id/usage endpoint is live quota snapshots, not cost, org/region-scoped, and currently JWT-only — not API-key callable (open issue daytonaio/daytona#4643).
  • Up to 48h billing lag, documented.
  • Our sandboxes are ephemeral (cold VM per prompt turn, destroyed in a finally), so a list() cron finds them already gone, and there's no startedAt/stoppedAt to reconstruct runtime from.

So measurement stays in the runner (the only component that observes a full lifetime), and Daytona labels are repurposed for audit / leak-detection / reconciliation, not billing.

Design (three insertion points + an audit cron)

  • (A) Gate — a soft check_entitlements(resource, cache=True) folded into the cached auth check: "authenticated and entitled to run?" Returns 429 once the project is over its monthly minute budget. Records run_id → resolved scope for attribution.
  • (B) Measureservices/agent/src/engines/rivet.ts::runRivet() already brackets SandboxAgent.start()destroySandbox() (warmup included); capture runtimeMs and tag the sandbox with labels.
  • (C) Account — the runner sends a trusted post-run report (run_id, sandbox_id, minutes) to an internal endpoint authed as the agent service's existing credential (not the admin key). Attribution comes from the run record (never the payload), so it can't bill arbitrary tenants and adds no new secret; idempotent on sandbox_id; charges via the same atomic, fail-open check_entitlements every meter uses.
  • Reconciliation cron (audit only) — list() non-deleted sandboxes by label to flag orphaned/leaked VMs and sanity-check the 48h-lagged dashboard. Not a billing source.

Everything else is the well-worn "add a counter" path (extend-meters): one enum member, a Quota(scope=PROJECT, period=MONTHLY, strict=True) per plan, one Alembic enum migration (template exists), add the slug to REPORTS so the existing meters→Stripe cron flushes it (project rows roll up per org via organization_id), and /billing/usage surfaces it.

Semantics worth flagging

Post-paid: a run already in flight finishes and is billed; the gate reads the last-booked value, so this is a soft, slightly-lagged budget guardrail, not a hard real-time cutoff (strict bounds overshoot to one run).

Files

  • proposal.md — the resource model, the Daytona verdict, the gate/measure/account flow, registry/Stripe/DB steps, reconciliation cron, risks.
  • research.md — grounding in current metering/billing/sandbox code (file:line) and the cited Daytona API findings.
  • tasks.md — ordered checklist + open inputs (per-plan numbers, join-key/store, internal report auth, Support tool_choice in Agenta prompt templates #4643 tracking, phase-2 Scope.AGENT).

Notes

  • Still a draft on the product side: per-plan minute allotments and overage price are open.
  • Open implementation decisions called out in tasks.md: the run_id → scope join store (durable vs Redis) and the exact existing internal credential for the report endpoint.

🤖 Generated with Claude Code

https://claude.ai/code/session_01MdaZVVA8e9LHk2ZrsEJEBj


Generated by Claude Code

Design for metering agent-runner (Daytona) sandbox wall-time as a billable
per-minute meter: capture runtime on the runner span, charge per-org in the
tracing worker via check_entitlements, soft-gate at the invocation edge, and
report to Stripe. Includes research grounding and an implementation checklist.
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Building Building Preview, Comment Jun 20, 2026 6:53pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 91fdab5c-6fb8-465d-a6b4-7f90cdc02540

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/git-butler-agent-prs-b227dz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…ents

Pivot the sandbox runtime metering design after verifying Daytona's API:
there is no API-reachable per-sandbox usage/cost (dashboard-only, JWT-only
/usage endpoint, 48h lag), and our sandboxes are ephemeral, so a Daytona-pull
cron cannot be the billing source. Measurement stays in the runner; Daytona
labels are repurposed for audit/reconciliation only.

Reframe around a configurable, scoped resource (entitlement): gate folded into
the cached auth check, trusted post-run report joined to the scope recorded at
the gate (no new secret, attribution from the run record, idempotent on
sandbox_id), project-scoped by default with user/agent scopes as later rungs.
@mmabrouk mmabrouk changed the title [docs] Sandbox runtime metering design proposal [docs] Sandbox runtime metering — scoped-resource design Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants