feat(agent): deliver resolved tools through the runner#4765
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: interesting code
|
| ]; | ||
|
|
||
| /** Build the child env from a minimal allowlist (copied only when set) plus scoped secrets. */ | ||
| function buildChildEnv( |
There was a problem hiding this comment.
This is the boundary that keeps a code tool from seeing platform secrets: the child gets only this allowlist plus its own scoped env, not the sidecar's process.env (where the in-process Pi path writes provider keys). Confirm nothing secret-bearing creeps into BASE_ENV_ALLOWLIST.
| return runCodeTool(spec.runtime, spec.code ?? "", spec.env, params, opts.signal); | ||
| } | ||
| if (spec.kind === "client") { | ||
| throw new Error( |
There was a problem hiding this comment.
Single source of truth for branch-on-kind. A spec with no kind falls through to the callback path here, which preserves the old behavior exactly; please confirm every call site relies on that same default.
| // Agenta's /tools/call. A unique id per call so two parallel calls in the same | ||
| // millisecond don't collide (Date.now() would). | ||
| const text = await runResolvedTool(spec, params?.arguments, { | ||
| toolCallId: randomUUID(), |
There was a problem hiding this comment.
Real fix: the old id was tool-${Date.now()}, so two parallel calls in the same millisecond shared a relay filename / call id. randomUUID() removes the collision.
| export function policyFromRequest(permissionPolicy?: string): PermissionPolicy { | ||
| if (permissionPolicy === "deny" || process.env.AGENTA_RIVET_DENY_PERMISSIONS === "true") { | ||
| return "deny"; | ||
| } |
There was a problem hiding this comment.
policyFromRequest keeps the prior precedence: explicit per-run deny or AGENTA_RIVET_DENY_PERMISSIONS flips to deny, otherwise auto-allow. The default must stay auto so headless /invoke runs are unchanged.
|
Superseded. Replacing the path-based stack with PRs sliced by functional area showing final code only, so reviewers don't comment on intermediate scaffolding that a later PR rewrites. See the new set. |
This PR is part of a stack. Review bottom-up.
Each PR's diff is only its own delta. Merge from the bottom. This PR's base is #4764 (merge that first).
Context
The base branch
feat/sdk-local-tools-serviceresolves a tool in the SDK and composes the resolved spec into the run request. The runner still treated every tool as one shape: POST the call back to Agenta's/tools/call. This PR is slice #9 ofdocs/design/agent-workflows/pr-stack.md(tool runtime). It makes the TypeScript runner execute a resolved tool by its kind, so a tool can run locally instead of always routing back to the server.What this changes
A resolved tool now carries an executor
kind. The runner branches on it:callback(the default, and the only old behavior): POST back through Agenta's/tools/call, so the Composio key and connection auth stay server-side.code: run the tool's snippet in a subprocess with a scoped secret env. No round trip to the server.client: browser-fulfilled across a turn boundary, so the in-sandbox paths skip it.Before, each delivery path (in-process Pi, Pi-under-rivet, the MCP bridge) carried its own copy of the "POST the call back" logic, and the Daytona file relay lived inside
extensions/agenta.ts. After, onetools/dispatch.tsowns the branch-on-kind decision and the relay; each call site keeps only its own result wrapping. Theclient.tstransport is renamed tocallback.tsto name what it is now (one executor among three, not the whole tool client).protocol.tsgrows the spec from one axis to three orthogonal ones:kind(executor),needsApproval(human gate), andrender(generative-UI hint). It also addsinteraction_requestevents and anMcpServerConfigso the wire can carry those later.On the Python side, the 491-line
ui_messages.pyegress splits into anagents/adapters/vercel/package (messages, routing, sse, stream). The old module keeps thin re-exports for back-compat. The/messagesroute now selects its wire format by endpoint (vercel), not by theAcceptheader, because a Vercel UI message stream and a plain SSE stream share thetext/event-streammedia type.Key architectural decision to review
The most important file is
services/agent/src/tools/code.ts. Acodetool runs author-supplied code in the same sandbox where the harness runs, so its env is the security boundary. The child process does NOT inherit the sidecar'sprocess.env. It gets a minimal startup allowlist (PATH,HOME, locale, temp, Windows essentials) plus only the tool's own scoped secrets. This matters because the in-process Pi path writes provider keys likeOPENAI_API_KEYintoprocess.envbefore a run, andAGENTA_*/COMPOSIO_*/DAYTONA_*config lives there too. An allowlist that leaks any of those would hand a snippet the platform's keys. ScrutinizeBASE_ENV_ALLOWLISTandbuildChildEnvfor anything secret-bearing, and confirm the timeout/abort path alwaysSIGKILLs the child.The second decision is the
Responderseam inservices/agent/src/responder.ts. The rivet permission gate was a hardcoded auto-approve. This lifts it behind an interface so a cross-turn HITL responder can slot in later without touching the harness adapter.PolicyResponderreproduces the old behavior exactly, including theAGENTA_RIVET_DENY_PERMISSIONSprecedence. Check that the default stays auto-allow and thatdecisionToReplymaps onto the ACP replies the harness actually offers.How to review this PR
Read in this order:
services/agent/src/protocol.ts— the three-axisResolvedToolSpec,RenderHint, and the new event variants. This is the contract everything downstream branches on.services/agent/src/tools/dispatch.ts—runResolvedTool, the single branch-on-kind. Confirmclientthrows andcallbackchooses relay vs direct POST byrelayDir.services/agent/src/tools/code.ts— the sandbox env boundary (see above).engines/pi.tsbuildCustomTools,extensions/agenta.tsregisterTools,tools/mcp-server.ts. Check each still wraps results its own way and skipsclienttools.Skip the
docs/design/agent-workflows/sdk-local-tools/tree (design notes and a review log, not shipped behavior) and theservices/agent/README.mdone-word rename.Likely regression: a tool with no
kindset must still behave exactly like the old callback tool. Verify theundefined->callbackfallback inrunResolvedTooland in every call site's branch.Tests / notes
New TypeScript tests cover the code-tool executor, dispatch routing, the MCP server, the responder, and continuation. The MCP bridge tool id moved from
Date.now()torandomUUID()so two calls in the same millisecond no longer collide.