[docs] Sandbox runtime metering — scoped-resource design by mmabrouk · Pull Request #4783 · Agenta-AI/agenta

mmabrouk · 2026-06-20T12:49:10Z

Context

We pay Daytona by the minute for the ephemeral VM that runs an agent, but that runtime never reaches our pricing surface — no meter, no per-plan limit, nothing reported to Stripe. This design adds sandbox wall-time as a first-class billable dimension, modeled as a configurable, scoped resource so it also lays the first rail for "limit usage per project / agent / user."

Documentation only. Adds a design under docs/designs/sandbox-runtime-metering/ (proposal, research, tasks). No code or schema changes.

The model

A resource is a named, scoped entitlement with a quota — exactly what the EE check_entitlements / Quota / Scope / meters layer already provides. A request declares the resource it's about to consume; the system checks entitlement at the same cached point it already checks auth, and only then runs; the consumed minutes are booked after. Shipped project-scoped by default (Scope.PROJECT), with USER available today and a new AGENT scope as an explicit phase 2.

What changed since the first draft (and why)

The first draft pulled runtime out of the OTel trace pipeline. We then explored "tag sandboxes, let a cron pull usage from Daytona, never touch the run path." Verifying Daytona's API killed that approach for our workload:

No per-sandbox usage/cost API — CPU-seconds / GB-seconds / price are dashboard-only.
The one /organizations/:id/usage endpoint is live quota snapshots, not cost, org/region-scoped, and currently JWT-only — not API-key callable (open issue daytonaio/daytona#4643).
Up to 48h billing lag, documented.
Our sandboxes are ephemeral (cold VM per prompt turn, destroyed in a finally), so a list() cron finds them already gone, and there's no startedAt/stoppedAt to reconstruct runtime from.

So measurement stays in the runner (the only component that observes a full lifetime), and Daytona labels are repurposed for audit / leak-detection / reconciliation, not billing.

Design (three insertion points + an audit cron)

(A) Gate — a soft check_entitlements(resource, cache=True) folded into the cached auth check: "authenticated and entitled to run?" Returns 429 once the project is over its monthly minute budget. Records run_id → resolved scope for attribution.
(B) Measure — services/agent/src/engines/rivet.ts::runRivet() already brackets SandboxAgent.start()→destroySandbox() (warmup included); capture runtimeMs and tag the sandbox with labels.
(C) Account — the runner sends a trusted post-run report (run_id, sandbox_id, minutes) to an internal endpoint authed as the agent service's existing credential (not the admin key). Attribution comes from the run record (never the payload), so it can't bill arbitrary tenants and adds no new secret; idempotent on sandbox_id; charges via the same atomic, fail-open check_entitlements every meter uses.
Reconciliation cron (audit only) — list() non-deleted sandboxes by label to flag orphaned/leaked VMs and sanity-check the 48h-lagged dashboard. Not a billing source.

Everything else is the well-worn "add a counter" path (extend-meters): one enum member, a Quota(scope=PROJECT, period=MONTHLY, strict=True) per plan, one Alembic enum migration (template exists), add the slug to REPORTS so the existing meters→Stripe cron flushes it (project rows roll up per org via organization_id), and /billing/usage surfaces it.

Semantics worth flagging

Post-paid: a run already in flight finishes and is billed; the gate reads the last-booked value, so this is a soft, slightly-lagged budget guardrail, not a hard real-time cutoff (strict bounds overshoot to one run).

Files

proposal.md — the resource model, the Daytona verdict, the gate/measure/account flow, registry/Stripe/DB steps, reconciliation cron, risks.
research.md — grounding in current metering/billing/sandbox code (file:line) and the cited Daytona API findings.
tasks.md — ordered checklist + open inputs (per-plan numbers, join-key/store, internal report auth, Support tool_choice in Agenta prompt templates #4643 tracking, phase-2 Scope.AGENT).

Notes

Still a draft on the product side: per-plan minute allotments and overage price are open.
Open implementation decisions called out in tasks.md: the run_id → scope join store (durable vs Redis) and the exact existing internal credential for the report endpoint.

🤖 Generated with Claude Code

https://claude.ai/code/session_01MdaZVVA8e9LHk2ZrsEJEBj

Generated by Claude Code

Design for metering agent-runner (Daytona) sandbox wall-time as a billable per-minute meter: capture runtime on the runner span, charge per-org in the tracing worker via check_entitlements, soft-gate at the invocation edge, and report to Stripe. Includes research grounding and an implementation checklist.

vercel · 2026-06-20T12:49:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Building	Preview, Comment	Jun 20, 2026 6:53pm

coderabbitai · 2026-06-20T12:49:18Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 91fdab5c-6fb8-465d-a6b4-7f90cdc02540

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/git-butler-agent-prs-b227dz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ents Pivot the sandbox runtime metering design after verifying Daytona's API: there is no API-reachable per-sandbox usage/cost (dashboard-only, JWT-only /usage endpoint, 48h lag), and our sandboxes are ephemeral, so a Daytona-pull cron cannot be the billing source. Measurement stays in the runner; Daytona labels are repurposed for audit/reconciliation only. Reframe around a configurable, scoped resource (entitlement): gate folded into the cached auth check, trusted post-run report joined to the scope recorded at the gate (no new secret, attribution from the run record, idempotent on sandbox_id), project-scoped by default with user/agent scopes as later rungs.

mmabrouk changed the title ~~[docs] Sandbox runtime metering design proposal~~ [docs] Sandbox runtime metering — scoped-resource design Jun 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Sandbox runtime metering — scoped-resource design#4783

[docs] Sandbox runtime metering — scoped-resource design#4783
mmabrouk wants to merge 2 commits into
mainfrom
claude/git-butler-agent-prs-b227dz

mmabrouk commented Jun 20, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmabrouk commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

The model

What changed since the first draft (and why)

Design (three insertion points + an audit cron)

Semantics worth flagging

Files

Notes

Uh oh!

vercel Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmabrouk commented Jun 20, 2026 •

edited

Loading

vercel Bot commented Jun 20, 2026 •

edited

Loading

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading