Skip to content

feat(agent): deliver resolved tools through the runner#4765

Closed
mmabrouk wants to merge 1 commit into
feat/sdk-local-tools-servicefrom
feat/sdk-local-tools-runner-docs
Closed

feat(agent): deliver resolved tools through the runner#4765
mmabrouk wants to merge 1 commit into
feat/sdk-local-tools-servicefrom
feat/sdk-local-tools-runner-docs

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 19, 2026

Copy link
Copy Markdown
Member

This PR is part of a stack. Review bottom-up.

Each PR's diff is only its own delta. Merge from the bottom. This PR's base is #4764 (merge that first).

Context

The base branch feat/sdk-local-tools-service resolves a tool in the SDK and composes the resolved spec into the run request. The runner still treated every tool as one shape: POST the call back to Agenta's /tools/call. This PR is slice #9 of docs/design/agent-workflows/pr-stack.md (tool runtime). It makes the TypeScript runner execute a resolved tool by its kind, so a tool can run locally instead of always routing back to the server.

What this changes

A resolved tool now carries an executor kind. The runner branches on it:

  • callback (the default, and the only old behavior): POST back through Agenta's /tools/call, so the Composio key and connection auth stay server-side.
  • code: run the tool's snippet in a subprocess with a scoped secret env. No round trip to the server.
  • client: browser-fulfilled across a turn boundary, so the in-sandbox paths skip it.

Before, each delivery path (in-process Pi, Pi-under-rivet, the MCP bridge) carried its own copy of the "POST the call back" logic, and the Daytona file relay lived inside extensions/agenta.ts. After, one tools/dispatch.ts owns the branch-on-kind decision and the relay; each call site keeps only its own result wrapping. The client.ts transport is renamed to callback.ts to name what it is now (one executor among three, not the whole tool client).

protocol.ts grows the spec from one axis to three orthogonal ones: kind (executor), needsApproval (human gate), and render (generative-UI hint). It also adds interaction_request events and an McpServerConfig so the wire can carry those later.

On the Python side, the 491-line ui_messages.py egress splits into an agents/adapters/vercel/ package (messages, routing, sse, stream). The old module keeps thin re-exports for back-compat. The /messages route now selects its wire format by endpoint (vercel), not by the Accept header, because a Vercel UI message stream and a plain SSE stream share the text/event-stream media type.

Key architectural decision to review

The most important file is services/agent/src/tools/code.ts. A code tool runs author-supplied code in the same sandbox where the harness runs, so its env is the security boundary. The child process does NOT inherit the sidecar's process.env. It gets a minimal startup allowlist (PATH, HOME, locale, temp, Windows essentials) plus only the tool's own scoped secrets. This matters because the in-process Pi path writes provider keys like OPENAI_API_KEY into process.env before a run, and AGENTA_* / COMPOSIO_* / DAYTONA_* config lives there too. An allowlist that leaks any of those would hand a snippet the platform's keys. Scrutinize BASE_ENV_ALLOWLIST and buildChildEnv for anything secret-bearing, and confirm the timeout/abort path always SIGKILLs the child.

The second decision is the Responder seam in services/agent/src/responder.ts. The rivet permission gate was a hardcoded auto-approve. This lifts it behind an interface so a cross-turn HITL responder can slot in later without touching the harness adapter. PolicyResponder reproduces the old behavior exactly, including the AGENTA_RIVET_DENY_PERMISSIONS precedence. Check that the default stays auto-allow and that decisionToReply maps onto the ACP replies the harness actually offers.

How to review this PR

Read in this order:

  1. services/agent/src/protocol.ts — the three-axis ResolvedToolSpec, RenderHint, and the new event variants. This is the contract everything downstream branches on.
  2. services/agent/src/tools/dispatch.tsrunResolvedTool, the single branch-on-kind. Confirm client throws and callback chooses relay vs direct POST by relayDir.
  3. services/agent/src/tools/code.ts — the sandbox env boundary (see above).
  4. The three call sites that now delegate: engines/pi.ts buildCustomTools, extensions/agenta.ts registerTools, tools/mcp-server.ts. Check each still wraps results its own way and skips client tools.

Skip the docs/design/agent-workflows/sdk-local-tools/ tree (design notes and a review log, not shipped behavior) and the services/agent/README.md one-word rename.

Likely regression: a tool with no kind set must still behave exactly like the old callback tool. Verify the undefined -> callback fallback in runResolvedTool and in every call site's branch.

Tests / notes

New TypeScript tests cover the code-tool executor, dispatch routing, the MCP server, the responder, and continuation. The MCP bridge tool id moved from Date.now() to randomUUID() so two calls in the same millisecond no longer collide.

@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 19, 2026 3:40pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ffe79f17-6772-4a32-8c3d-67b9b8239860

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sdk-local-tools-runner-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend Feature Request New feature or request SDK labels Jun 19, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

  • services/agent/src/tools/dispatch.ts:104runResolvedTool is the single branch-on-kind dispatch; code runs locally, client throws, callback picks relay vs direct POST.
  • services/agent/src/tools/code.ts:82BASE_ENV_ALLOWLIST plus buildChildEnv is the security boundary; the snippet sees only this allowlist plus its scoped secrets, never the sidecar's process.env.
  • services/agent/src/protocol.ts:64ResolvedToolSpec gains three orthogonal axes (kind, needsApproval, render); callRef is now optional and undefined kind means callback.
  • services/agent/src/responder.ts:48PolicyResponder lifts the rivet auto-approve behind a seam so a cross-turn HITL responder can slot in without touching the harness.
  • services/agent/src/tools/mcp-server.ts:85 — the MCP bridge tool id moves from Date.now() to randomUUID() so parallel same-millisecond calls no longer collide.
  • sdks/python/agenta/sdk/agents/adapters/vercel/__init__.py:1 — the 491-line ui_messages.py egress splits into this package; the /messages route selects wire format by endpoint, not the Accept header.

];

/** Build the child env from a minimal allowlist (copied only when set) plus scoped secrets. */
function buildChildEnv(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the boundary that keeps a code tool from seeing platform secrets: the child gets only this allowlist plus its own scoped env, not the sidecar's process.env (where the in-process Pi path writes provider keys). Confirm nothing secret-bearing creeps into BASE_ENV_ALLOWLIST.

return runCodeTool(spec.runtime, spec.code ?? "", spec.env, params, opts.signal);
}
if (spec.kind === "client") {
throw new Error(

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single source of truth for branch-on-kind. A spec with no kind falls through to the callback path here, which preserves the old behavior exactly; please confirm every call site relies on that same default.

// Agenta's /tools/call. A unique id per call so two parallel calls in the same
// millisecond don't collide (Date.now() would).
const text = await runResolvedTool(spec, params?.arguments, {
toolCallId: randomUUID(),

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real fix: the old id was tool-${Date.now()}, so two parallel calls in the same millisecond shared a relay filename / call id. randomUUID() removes the collision.

export function policyFromRequest(permissionPolicy?: string): PermissionPolicy {
if (permissionPolicy === "deny" || process.env.AGENTA_RIVET_DENY_PERMISSIONS === "true") {
return "deny";
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policyFromRequest keeps the prior precedence: explicit per-run deny or AGENTA_RIVET_DENY_PERMISSIONS flips to deny, otherwise auto-allow. The default must stay auto so headless /invoke runs are unchanged.

@mmabrouk

Copy link
Copy Markdown
Member Author

Superseded. Replacing the path-based stack with PRs sliced by functional area showing final code only, so reviewers don't comment on intermediate scaffolding that a later PR rewrites. See the new set.

@mmabrouk mmabrouk closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Feature Request New feature or request SDK size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant