feat(agent): move the agent runtime into the SDK behind backend/harness ports by mmabrouk · Pull Request #4761 · Agenta-AI/agenta

mmabrouk · 2026-06-19T15:40:27Z

This PR is part of a stack. Review bottom-up.

Each PR's diff is only its own delta. Merge from the bottom. This PR's base is #4760 (merge that first).

feat(agent): Pi-backed agent workflow service, template, and tracing #4758: Pi-backed agent workflow service, template, tracing
feat(agent): runnable tools as agent configuration #4759: runnable tools as agent configuration
feat(agent): drive harnesses over ACP via the rivet sandbox-agent #4760: drive harnesses over ACP via rivet
feat(agent): move the agent runtime into the SDK behind backend/harness ports #4761: move the runtime into the SDK behind backend/harness ports <- you are here
- chore(agent): docker cleanups for the sandbox-agent sidecar #4762: docker cleanups for the sidecar (side branch off feat(agent): move the agent runtime into the SDK behind backend/harness ports #4761)
feat(sdk): typed agent tool resolution contracts #4763: typed SDK agent tool contracts
feat(agent): resolve typed tools through the service #4764: resolve typed tools through the service
feat(agent): deliver resolved tools through the runner #4765: deliver resolved tools through the runner
refactor(agent): remove Vercel adapter dead aliases #4766: remove Vercel adapter dead aliases
fix(agent): propagate messages session ids to runner traces #4767: propagate messages session ids to runner traces
feat(agent): route load-session through a no-op session store port #4768: load-session via a no-op session store port
feat(sdk): advertise Vercel messages protocol headers #4769: advertise Vercel messages protocol headers
docs(agent): agent-workflows ground truth + comment hygiene #4770: agent-workflows ground truth + comment hygiene

Context

This PR pulls the agent runtime out of the service and into the SDK behind ports and adapters. It builds on feat/agent-rivet-acp-wp8 and lands slice #4 from docs/design/agent-workflows/pr-stack.md: it draws the durable line between the neutral agent definition, the harness-specific config, and the runtime infrastructure. Before this change, the service handler owned the engine. The 470-line services/oss/src/agent.py parsed the request, picked a harness, and called the runner all in one place, and the TypeScript runner owned the per-harness mapping. That made it hard to add an engine or run an agent without the service.

What this changes

The runtime now lives in agenta.sdk.agents as five ports. Backend is the engine, Environment owns the sandbox policy over a backend, Sandbox is where a session's process tree runs, Session is one conversation, and Harness maps a neutral SessionConfig into harness-specific config. InProcessPiBackend and RivetBackend implement Backend. PiHarness, ClaudeHarness, and AgentaHarness implement Harness.

The service stops owning the engine and only composes adapters. Before, agent.py chose a harness and drove the runner inline. Now services/oss/src/agent/app.py resolves tools and secrets, picks a backend with select_backend, then runs one turn through make_harness(...).prompt(...). The handler holds no engine knowledge.

The neutral config and the harness config are now separate types. The author edits one neutral AgentConfig (instructions, model, tools) plus a RunSelection (harness, sandbox, permission policy). Each Harness turns that into a PiAgentConfig, ClaudeAgentConfig, or AgentaAgentConfig. These differ in shape, not just identity: Pi carries built-in tool names and never gates tool use, while Claude has no built-ins, delivers tools over MCP, and carries a permission policy.

The TypeScript runner moves into role folders (engines/, tools/, tracing/, extensions/), runPi.ts becomes engines/pi.ts, and the flat ten-file src/ gets a rewritten README. A dedicated AgentConfigControl.tsx element edits the typed config in the playground and reuses the model selector, the tool picker, and the enum selects, so Composio and built-in tools are finally selectable on an agent.

Key architectural decision to review

The SDK owns the runtime ports and the service only composes adapters. Read sdks/python/agenta/sdk/agents/interfaces.py. A Backend declares supported_harnesses and stays pure plumbing: it takes an already-harness-shaped config and launches it. A Harness validates at construction that the environment's backend can drive it, and raises UnsupportedHarnessError otherwise. The per-harness knowledge that used to sit in the TypeScript runner now lives in adapters/harnesses.py, on the Python side. The tradeoff: this is more indirection than a service-owned switch, but it is what lets a future standalone SDK run an agent with no service, and it puts the "which engine can drive which harness" rule in one typed place instead of scattered string checks.

The backend is a deployment choice, the harness is editable config, and select_backend straddles them. Read services/oss/src/agent/app.py. select_backend upgrades pi/agenta to the rivet backend when the harness or a non-local sandbox needs it, so a Claude harness or a Daytona sandbox never silently drops the choice. Scrutinize the seam: the in-process backend supports {PI, AGENTA} and the rivet backend supports {PI, CLAUDE}, so agenta plus a non-local sandbox has no backend and raises rather than running the wrong thing. That gap is intentional, and it is the line most worth a second look.

How to review this PR

Review the 7 commits one at a time. The middle commits are pure renames and folder moves with no behavior change, so the diff is large but most of it is mechanical.

Open commit 1 (session-shaped harness/runtime port) first. It introduces the session-shaped ports and the shared wire contract. This is the conceptual core.
Skim commits 2 and 3 (the agent/harness split and the TypeScript role-folder regroup). These are moves and renames. Read the new file boundaries, not every moved line.
Read commit 4 (dedicated agent-config playground element) for the typed config and the new control, and commit 5 (relay Pi tool calls through the runner on Daytona) for the one real behavior fix in the runner.
Skip commit 6 (docs restructure) unless you want the design narrative.
Read commit 7 (move the agent runtime into the SDK) last. It relocates the runtime to agenta.sdk.agents, rewires the service onto the ports, deletes services/oss/src/harness, and adds the SDK and golden tests. This is where the final shape lands.

Then read the code in this order: dtos.py and interfaces.py for the contracts, adapters/in_process.py and adapters/rivet.py for the two backends, adapters/harnesses.py for the neutral-to-harness mapping, services/oss/src/agent/app.py for the composition, and AgentConfigControl.tsx for the playground.

The regression most likely to break: select_backend routing. An old revision that sends only the flat params, or an agenta harness paired with a non-local sandbox, must keep resolving to the same backend it did before. RunSelection.from_params and the fall-back to the old shape are what hold the existing revisions.

Tests / notes

The PR adds SDK unit tests (test_harness_adapters.py, test_environment_lifecycle.py, test_dtos_*), a wire-contract test, and golden /run request/result fixtures under sdks/python/oss/tests/pytest/unit/agents/. The commit messages record live verification across pi, rivet+pi+local, rivet+claude+local, and rivet+pi+daytona. LocalBackend is still a stub and raises NotImplementedError; AgentaHarness does not yet run on rivet or Daytona. Both are known and tracked in ground-truth.md.

Evolve the agent service ports toward the rivet sandbox-agent session shape, so the rivet (ACP) and legacy in-process Pi backends share one clean, capability-aware port. - ports.py: Environment + Harness seams, a first-class AgentSession (create/prompt/ destroy), HarnessCapabilities, ContentBlock, Message, AgentEvent, structured AgentResult. - harness.py: SubprocessHarness + HttpHarness share one wire contract (wire.py), replacing the pi_harness/pi_http_harness/rivet_harness trio. The engine is an env value. - TS: shared protocol.ts; runPi/runRivet return the enriched result; runRivet probes getAgent() capabilities and routes tools by mcpTools, not the harness name; usage flows on the rivet path (split from PromptResponse.usage); one shared toolClient.ts replaces the triplicated /tools/call client. - agent.py uses the session API; _select_backend upgrades pi/local to rivet when the selected harness/sandbox needs it. permission_policy added to /inspect. Verified live: pi, rivet+pi+local, rivet+claude+local, rivet+pi+daytona; playground run succeeds with usage; invoke_agent nests under the /invoke span. Design notes under docs/design/agent-workflows/harness-port-redesign/.

…ss runtime Address the god-module and the misleading package name: - services/oss/src/harness/ (was agent_pi/): the engine-agnostic runtime — ports.py, transports.py (was harness.py), environment.py, wire.py. Named for the seam, not Pi; harness choice (pi/claude) lives inside the runtime, so there is no agent_claude. - services/oss/src/agent/ (was the 470-line agent.py god-module): the Agenta workflow app — app.py (thin handler + backend wiring), inputs.py (request parsing), tools.py, secrets.py, tracing.py, client.py (shared backend access), schemas.py, config.py. No behavior change. Verified live: a playground run answers 'REFACTOR-OK Lisbon' with usage.

…ite README The TS runner's src/ had grown one work package at a time into a flat folder of ten files with no signal of role. Group them and rewrite the stale README (it still called this a 'Pi wrapper' and pointed at the moved agent.py): src/cli.ts, server.ts, protocol.ts entrypoints + the wire contract src/engines/{pi,rivet}.ts the two engines (was runPi.ts / runRivet.ts) src/tracing/otel.ts the tracers (was agenta-otel.ts) src/tools/{client,mcp-bridge,mcp-server}.ts tool delivery (was toolClient/toolBridge*) src/extensions/agenta.ts the Pi extension (was piExtension.ts) No behavior change. Updated the fragile __dirname-relative paths in engines/rivet.ts (PKG_ROOT) and tools/mcp-bridge.ts (the tsx bin + server path) for the new depth, and the build-extension entry. Verified live: rivet+pi+local through the restarted sidecar answers 'Athens' with usage; tsc --strict and the extension build pass.

Replace the loose model/agents_md/harness/sandbox params with one `agent` config element (x-ag-type: agent_config) carrying instructions, model, tools, harness, sandbox, and permission policy. The playground renders it through a new AgentConfigControl that reuses the existing controls: the model selector, the tool picker (so Composio and builtin tools are finally selectable on the agent), the enum selects, and a textarea. The backend reads it via resolve_agent_config and falls back to the old shape so existing revisions keep running. Verified live: the element renders with the tool picker, and a GitHub Composio tool runs end to end on pi+local.

Tools worked locally but failed on Daytona. The in-sandbox Pi extension POSTed each tool call to Agenta's /tools/call, but a firewalled or private backend does not expose that to the remote cloud sandbox (the same reason tracing is built from the event stream on Daytona rather than in-sandbox OTLP). The sandbox has internet but cannot reach the backend, so the call failed and the model gave up. Route the call through the runner, which can reach Agenta. The extension writes the request to a file in a sandbox dir and polls for the response; the runner watches the dir over the daemon filesystem API, calls /tools/call, and writes the result back (tools/relay.ts). Local runs keep the direct path. Verified programmatically: rivet+pi+daytona with a GitHub Composio tool now returns the real login (was 'the tool failed twice'); local is unchanged.

Move the raw work-package material (wp-1..wp-8, harness-port-redesign, research) into scratch/ and add clean top-level pages a reviewer can read top to bottom: a README index, architecture, ports-and-adapters, sessions, and adapters/{pi,claude-code}. Update the three in-code references that pointed at the moved doc paths.

…ss ports Relocate the neutral runtime to agenta.sdk.agents (dtos / interfaces / adapters / utils): the Backend / Environment / Sandbox / Session / Harness ports, the RivetBackend / InProcessPiBackend / LocalBackend backends, the Pi / Claude / Agenta harness adapters (which own the per-harness config mapping), and the /run wire. Rewire the agent service onto the ports and delete services/oss/src/harness. Add SDK + service unit and golden tests, and update the agent-workflows docs to the as-built design.

vercel · 2026-06-19T15:40:32Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 19, 2026 3:40pm

coderabbitai · 2026-06-19T15:40:35Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b297c850-ffdd-4e4b-b267-a0ba26ceb7ec

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-harness-port

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mmabrouk · 2026-06-19T15:51:29Z

Reviewer guide: interesting code

The interesting decisions, by file and line:

sdks/python/agenta/sdk/agents/interfaces.py:210 — a Harness validates against Backend.supported_harnesses at construction and raises UnsupportedHarnessError, so "which engine can drive which harness" is one typed check, not scattered string compares.
sdks/python/agenta/sdk/agents/interfaces.py:279 — the cold stream path carries the session id forward via on_result and tears the session down via on_cleanup, so a drained, broken, or cancelled stream still cleans up.
sdks/python/agenta/sdk/agents/adapters/in_process.py:118 — the in-process backend supports {PI, AGENTA} while rivet supports {PI, CLAUDE}; this split is the real constraint select_backend reads, and it is why agenta + non-local has no backend.
sdks/python/agenta/sdk/agents/adapters/harnesses.py:96 — ClaudeHarness drops Pi built-in tools (with a warning) and carries the permission policy, while PiHarness keeps built-ins and never gates; the per-harness mapping that used to live in the TypeScript runner now lives here.
services/oss/src/agent/app.py:64 — select_backend upgrades pi/agenta to rivet when the harness or sandbox needs it; the handler holds no engine knowledge and only composes a Harness over an Environment.
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx — one composite control dispatched from x-ag-type: "agent_config" reuses the existing model selector, tool picker, and enum selects, so tools are stored as the same tool-object shape the backend resolver already parses.

mmabrouk · 2026-06-19T15:51:44Z

+    harness_type: ClassVar[HarnessType]
+
+    def __init__(self, environment: Environment) -> None:
+        if not environment.backend.supports(self.harness_type):


This construction-time check is the heart of the design: the rule for which engine can drive which harness lives in one typed place (Backend.supported_harnesses), not in scattered string compares across the service. A misconfigured harness/backend pair fails here, before any run.

mmabrouk · 2026-06-19T15:51:44Z

+    cwd = str(wrapper_dir())
+    use_rivet = (
+        runtime == "rivet"
+        or selection.harness not in ("pi", "agenta")


This OR is the seam to scrutinize. select_backend upgrades pi/agenta to rivet when the harness or sandbox needs it. Note the asymmetry it creates: in-process supports {PI, AGENTA} and rivet supports {PI, CLAUDE}, so an agenta harness plus a non-local sandbox lands on rivet, which cannot drive agenta, and raises UnsupportedHarnessError. That gap is intentional but worth confirming against the old behavior.

mmabrouk · 2026-06-19T15:51:45Z

+        # carried through.
+        if config.builtin_tools:
+            log.warning(
+                "ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept",


Claude drops Pi built-in tools rather than ship a name it cannot honor. This is the clearest case of why PiAgentConfig and ClaudeAgentConfig differ in shape, not just identity: built-ins are a Pi concept and Claude delivers tools over MCP. Confirm a Claude agent configured with built-in tools degrades sanely rather than silently dropping intended behavior.

mmabrouk · 2026-06-19T15:51:46Z

+            if result.session_id:
+                config.session_id = result.session_id
+
+        return session.stream(messages).on_result(_absorb).on_cleanup(session.destroy)


The cold stream path carries the session id forward (on_result) and destroys the session on stream end (on_cleanup), covering drain, break, and cancellation in one place. Worth checking the AgentRun cleanup hook actually fires on an early break.

coderabbitai

Actionable comments posted: 15

🧹 Nitpick comments (2)

services/agent/src/engines/pi.ts (1)

96-101: ⚖️ Poor tradeoff

Global process.env mutation may cause races in concurrent server use.

applySecrets modifies process.env globally, which could cause secret leakage between concurrent requests if runPi is called multiple times in parallel on the server path. The current architecture mitigates this (rivet is the platform default, and the subprocess transport runs one request per process), but this constraint should be documented or enforced.

Consider either:

Documenting that runPi must not be called concurrently in a single process, or

Passing secrets via a child process environment rather than mutating the current process

services/oss/tests/pytest/unit/agent/test_secrets_mapping.py (1)

13-24: ⚡ Quick win

Add a behavior test for resolve_harness_secrets() response shapes.

These assertions protect constants, but they won’t catch regressions in JSON payload parsing (list vs envelope). Add one mocked-response test for the resolver path to validate real extraction behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b749d53-e6c2-4f7d-b64c-c81a60ee683e

📥 Commits

Reviewing files that changed from the base of the PR and between 9c3d141 and 8b00633.

⛔ Files ignored due to path filters (1)

docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (126)

docs/design/agent-workflows/README.md
docs/design/agent-workflows/adapters/agenta.md
docs/design/agent-workflows/adapters/claude-code.md
docs/design/agent-workflows/adapters/pi.md
docs/design/agent-workflows/agent-protocol-rfc.md
docs/design/agent-workflows/architecture.md
docs/design/agent-workflows/ports-and-adapters.md
docs/design/agent-workflows/scratch/harness-port-redesign/README.md
docs/design/agent-workflows/scratch/harness-port-redesign/implementation.md
docs/design/agent-workflows/scratch/harness-port-redesign/plan.md
docs/design/agent-workflows/scratch/harness-port-redesign/proposal.md
docs/design/agent-workflows/scratch/harness-port-redesign/research.md
docs/design/agent-workflows/scratch/harness-port-redesign/status.md
docs/design/agent-workflows/scratch/research/auth-secrets.md
docs/design/agent-workflows/scratch/research/daytona-sandbox.md
docs/design/agent-workflows/scratch/research/diskless-in-memory-config.md
docs/design/agent-workflows/scratch/research/open-questions.md
docs/design/agent-workflows/scratch/research/otel-instrumentation.md
docs/design/agent-workflows/scratch/research/pi-interaction.md
docs/design/agent-workflows/scratch/research/sandbox-sharing.md
docs/design/agent-workflows/scratch/sdk-local-backend/status.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/README.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/.env.example
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.md
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/agenta-otel.ts
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/package.json
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.ts
docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md
docs/design/agent-workflows/scratch/wp-2-agent-service/README.md
docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md
docs/design/agent-workflows/scratch/wp-2-agent-service/qa.md
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/README.md
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/README.md
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.py
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/build_snapshot.py
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/cleanup.py
docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py
docs/design/agent-workflows/scratch/wp-4-multi-message-output/README.md
docs/design/agent-workflows/scratch/wp-5-chat-vs-completion/README.md
docs/design/agent-workflows/scratch/wp-6-workflow-type-and-template/README.md
docs/design/agent-workflows/scratch/wp-7-tools/README.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/README.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/architecture.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/context.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/isolation-and-fork.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/plan.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/debug-events.ts
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/dump-full.ts
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/package.json
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/spike.ts
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/research.md
docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.md
docs/design/agent-workflows/sessions.md
docs/design/agent-workflows/streaming-and-sessions.md
sdks/python/agenta/__init__.py
sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/__init__.py
sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
sdks/python/agenta/sdk/agents/adapters/harnesses.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/local.py
sdks/python/agenta/sdk/agents/adapters/rivet.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/errors.py
sdks/python/agenta/sdk/agents/interfaces.py
sdks/python/agenta/sdk/agents/streaming.py
sdks/python/agenta/sdk/agents/utils/__init__.py
sdks/python/agenta/sdk/agents/utils/ts_runner.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/agenta/tests/agents/test_streaming.py
sdks/python/oss/tests/pytest/unit/agents/__init__.py
sdks/python/oss/tests/pytest/unit/agents/conftest.py
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
services/agent/README.md
services/agent/scripts/build-extension.mjs
services/agent/skills/agenta-getting-started/SKILL.md
services/agent/src/cli.ts
services/agent/src/engines/pi.ts
services/agent/src/engines/rivet.ts
services/agent/src/extensions/agenta.ts
services/agent/src/protocol.ts
services/agent/src/runPi.ts
services/agent/src/server.ts
services/agent/src/tools/client.ts
services/agent/src/tools/mcp-bridge.ts
services/agent/src/tools/mcp-server.ts
services/agent/src/tools/relay.ts
services/agent/src/tracing/otel.ts
services/agent/test/stream-events.test.ts
services/oss/src/agent.py
services/oss/src/agent/__init__.py
services/oss/src/agent/app.py
services/oss/src/agent/client.py
services/oss/src/agent/config.py
services/oss/src/agent/schemas.py
services/oss/src/agent/secrets.py
services/oss/src/agent/tools.py
services/oss/src/agent/tracing.py
services/oss/src/agent_pi/__init__.py
services/oss/src/agent_pi/local_runtime.py
services/oss/src/agent_pi/pi_harness.py
services/oss/src/agent_pi/pi_http_harness.py
services/oss/src/agent_pi/ports.py
services/oss/src/agent_pi/rivet_harness.py
services/oss/src/agent_pi/schemas.py
services/oss/tests/pytest/unit/agent/__init__.py
services/oss/tests/pytest/unit/agent/conftest.py
services/oss/tests/pytest/unit/agent/test_invoke_handler.py
services/oss/tests/pytest/unit/agent/test_secrets_mapping.py
services/oss/tests/pytest/unit/agent/test_select_backend.py
services/oss/tests/pytest/unit/agent/test_tool_refs.py
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SchemaPropertyRenderer.tsx

💤 Files with no reviewable changes (9)

services/oss/src/agent_pi/local_runtime.py
services/oss/src/agent_pi/schemas.py
services/oss/src/agent_pi/rivet_harness.py
services/oss/src/agent_pi/init.py
services/oss/src/agent_pi/pi_http_harness.py
services/oss/src/agent_pi/pi_harness.py
services/agent/src/runPi.ts
services/oss/src/agent_pi/ports.py
services/oss/src/agent.py

coderabbitai · 2026-06-19T16:23:24Z

+  - `adapters/in_process.py` — `InProcessPiBackend` (engine hard-coded `pi`; pi only, local
+    only; the reference backend) + its sandbox/session.
+  - `adapters/local.py` — `LocalBackend`, STUB (raises `NotImplementedError`).


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the InProcessPiBackend harness support note.

Line 31 currently says in-process is “pi only,” but the current design notes in-process supports {PI, AGENTA}. Keeping this stale can misdirect backend-routing follow-up work.

Based on learnings from the provided PR context: backend capability split is described as in-process {PI, AGENTA} and rivet {PI, CLAUDE}.

coderabbitai · 2026-06-19T16:23:24Z

+                "inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),
+                "callRef": spec.get("callRef"),


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid shared mutable default inputSchema objects.

On Line 69, dict(_EMPTY_OBJECT_SCHEMA) only shallow-copies, so nested properties can be shared across tools. A later mutation can leak schema state between entries.

Suggested fix

def _normalize_tool_specs(specs: List[Dict[str, Any]]) -> List[Dict[str, Any]]: @@ - normalized.append( + input_schema = spec.get("inputSchema") + if not input_schema: + input_schema = {"type": "object", "properties": {}} + normalized.append( { "name": name, "description": spec.get("description") or name, - "inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA), + "inputSchema": input_schema, "callRef": spec.get("callRef"), } )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),

"callRef": spec.get("callRef"),

def _normalize_tool_specs(specs: List[Dict[str, Any]]) -> List[Dict[str, Any]]:

input_schema = spec.get("inputSchema")

if not input_schema:

input_schema = {"type": "object", "properties": {}}

normalized.append(

{

"name": name,

"description": spec.get("description") or name,

"inputSchema": input_schema,

"callRef": spec.get("callRef"),

}

)

coderabbitai · 2026-06-19T16:23:24Z

+    def __init__(
+        self,
+        backend: "InProcessPiBackend",
+        config: HarnessAgentConfig,
+        *,
+        secrets: Optional[Mapping[str, str]],
+        trace: Optional[TraceContext],
+        session_id: Optional[str],
+    ) -> None:
+        self._backend = backend
+        self._config = config
+        self._secrets = dict(secrets or {})
+        self._trace = trace
+        self._session_id = session_id
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

InProcessPiSession drops the selected harness and always serializes PI.

create_session(..., harness=...) (Line 142) never stores/passes harness, and _wire_payload hard-codes HarnessType.PI (Line 76). For AGENTA runs, this sends the wrong harness over /run and can bypass harness-specific behavior.

Proposed fix

class InProcessPiSession(Session): @@ def __init__( self, backend: "InProcessPiBackend", config: HarnessAgentConfig, *, + harness: HarnessType, secrets: Optional[Mapping[str, str]], trace: Optional[TraceContext], session_id: Optional[str], ) -> None: self._backend = backend self._config = config + self._harness = harness self._secrets = dict(secrets or {}) self._trace = trace self._session_id = session_id @@ return request_to_wire( engine=InProcessPiBackend._ENGINE, - harness=HarnessType.PI, + harness=self._harness, sandbox="local", config=self._config, messages=messages, secrets=self._secrets, trace=self._trace, session_id=self._session_id, ) @@ async def create_session( @@ return InProcessPiSession( self, config, + harness=harness, secrets=secrets, trace=trace, session_id=session_id, )

Also applies to: 75-77, 137-153

coderabbitai · 2026-06-19T16:23:25Z

+    supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+
+    async def create_sandbox(self) -> Sandbox:
+        raise NotImplementedError(
+            "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+            "Phase 4: Claude via claude-agent-sdk)."
+        )
+
+    async def create_session(
+        self,
+        sandbox: Sandbox,
+        config: HarnessAgentConfig,
+        *,
+        harness: HarnessType,
+        secrets: Optional[Mapping[str, str]] = None,
+        trace: Optional[TraceContext] = None,
+        session_id: Optional[str] = None,
+    ) -> Session:
+        raise NotImplementedError(
+            "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+            "Phase 4: Claude via claude-agent-sdk)."
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

supported_harnesses currently over-promises for an unimplemented backend.

Line 27 advertises PI/CLAUDE support, but both lifecycle methods always raise. Any routing/validation that trusts supports() can accept this backend and fail later at runtime.

Proposed fix

class LocalBackend(Backend): @@ - supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE}) + # Keep empty until lifecycle paths are implemented, so compatibility checks fail early. + supported_harnesses = frozenset()

coderabbitai · 2026-06-19T16:23:25Z

+    if isinstance(agent, dict):
+        return (
+            agent.get("instructions") or defaults.instructions,
+            agent.get("model") or defaults.model,
+            agent.get("tools"),
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve defaults.tools when agent.tools is omitted.

On Line 536, the agent-shape branch returns agent.get("tools") directly; when absent, AgentConfig.from_params() turns it into [] instead of falling back to defaults.tools, which contradicts the documented unset-field fallback behavior and can silently disable configured tools.

💡 Proposed fix

def _parse_agent_fields( params: Dict[str, Any], defaults: AgentConfig, ) -> Tuple[Optional[str], Optional[str], Any]: @@ agent = params.get("agent") if isinstance(agent, dict): + raw_tools = agent.get("tools") + if raw_tools is None: + raw_tools = defaults.tools return ( agent.get("instructions") or defaults.instructions, agent.get("model") or defaults.model, - agent.get("tools"), + raw_tools, )

coderabbitai · 2026-06-19T16:23:25Z

+        secrets = response.json() or []
+    except Exception:  # pylint: disable=broad-except
+        log.warning("agent: vault secrets fetch failed", exc_info=True)
+        return {}
+
+    env: Dict[str, str] = {}
+    for secret in secrets:
+        if not isinstance(secret, dict) or secret.get("kind") != "provider_key":
+            continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle /secrets/ envelope payloads; current parsing can silently drop all secrets.

At Line 58, response.json() is treated as an iterable secret list. If the endpoint returns an object envelope (for example { "data": [...] }), the loop at Line 64 iterates string keys and no secret is ever mapped, causing silent auth fallback.

Proposed fix

- secrets = response.json() or [] + payload = response.json() or [] except Exception: # pylint: disable=broad-except log.warning("agent: vault secrets fetch failed", exc_info=True) return {} env: Dict[str, str] = {} + if isinstance(payload, dict): + secrets = payload.get("data") or payload.get("results") or [] + elif isinstance(payload, list): + secrets = payload + else: + secrets = [] + for secret in secrets: if not isinstance(secret, dict) or secret.get("kind") != "provider_key": continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

secrets = response.json() or []

except Exception: # pylint: disable=broad-except

log.warning("agent: vault secrets fetch failed", exc_info=True)

return {}

env: Dict[str, str] = {}

for secret in secrets:

if not isinstance(secret, dict) or secret.get("kind") != "provider_key":

continue

payload = response.json() or []

except Exception: # pylint: disable=broad-except

log.warning("agent: vault secrets fetch failed", exc_info=True)

return {}

env: Dict[str, str] = {}

if isinstance(payload, dict):

secrets = payload.get("data") or payload.get("results") or []

elif isinstance(payload, list):

secrets = payload

else:

secrets = []

for secret in secrets:

if not isinstance(secret, dict) or secret.get("kind") != "provider_key":

continue

coderabbitai · 2026-06-19T16:23:25Z

+    async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client:
+        response = await client.post(
+            f"{api_base}/tools/resolve",
+            json={"tools": refs},
+            headers=headers,
+        )
+
+    if response.status_code >= 400:
+        raise RuntimeError(
+            f"Tool resolution failed (HTTP {response.status_code}): {response.text[:500]}"
+        )
+
+    data = response.json()
+    builtins = data.get("builtins") or []


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize transport/JSON failures into the same resolver error path.

Line 94 and Line 105 can raise httpx/JSON decode exceptions that bypass your explicit RuntimeError handling for failed resolution, producing inconsistent invoke failures.

Proposed fix

- async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client: - response = await client.post( - f"{api_base}/tools/resolve", - json={"tools": refs}, - headers=headers, - ) + try: + async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client: + response = await client.post( + f"{api_base}/tools/resolve", + json={"tools": refs}, + headers=headers, + ) + except httpx.HTTPError as exc: + raise RuntimeError(f"Tool resolution request failed: {exc}") from exc @@ - data = response.json() + try: + data = response.json() + except ValueError as exc: + raise RuntimeError("Tool resolution failed: backend returned invalid JSON") from exc

coderabbitai · 2026-06-19T16:23:25Z

+    if not usage or not usage.get("total"):
+        return


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t treat total=0 as absent usage.

Line 70 drops valid zero-token usage because 0 is falsy, so span usage attributes are never recorded for those runs.

Proposed fix

- if not usage or not usage.get("total"): + if not usage or "total" not in usage: return

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not usage or not usage.get("total"):

return

if not usage or "total" not in usage:

return

coderabbitai · 2026-06-19T16:23:25Z

+    monkeypatch.setattr(app, "select_backend", lambda selection: backend)
+    monkeypatch.setattr(


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cross-harness test can pass without validating harness routing.

Because the stub at Line 34 ignores selection, the equality assertion at Line 71 stays green even if harness parsing regresses to a single default route. Capture and assert the selected harness values to lock the intended behavior.

Suggested tightening

`@pytest.fixture` def patched(monkeypatch, fake_backend): backend = fake_backend(result=AgentResult(output="echo", usage={"total": 15})) recorded = {} + selected = [] @@ - monkeypatch.setattr(app, "select_backend", lambda selection: backend) + def _select_backend(selection): + selected.append(selection.harness) + return backend + monkeypatch.setattr(app, "select_backend", _select_backend) @@ - return backend, recorded + return backend, recorded, selected @@ async def test_invoke_body_is_identical_across_harnesses(patched): @@ - pi = await _invoke("pi") + _, _, selected = patched + pi = await _invoke("pi") agenta = await _invoke("agenta") claude = await _invoke("claude") assert pi == agenta == claude + assert selected == ["pi", "agenta", "claude"]

Also applies to: 65-71

coderabbitai · 2026-06-19T16:23:25Z

+function toolName(tool: unknown): string | undefined {
+    if (!tool || typeof tool !== "object") return undefined
+    const fn = (tool as Record<string, unknown>).function
+    if (!fn || typeof fn !== "object") return undefined
+    const name = (fn as Record<string, unknown>).name
+    return typeof name === "string" ? name : undefined
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Support legacy tool object shapes when deriving tool identity.

toolName() currently ignores bare-string and {name: ...} entries, but those shapes are still valid in resolver normalization. That breaks selectedToolNames/remove-by-name behavior for existing configs (duplicates can be re-added and removal can miss items).

Proposed fix

function toolName(tool: unknown): string | undefined { - if (!tool || typeof tool !== "object") return undefined - const fn = (tool as Record<string, unknown>).function - if (!fn || typeof fn !== "object") return undefined - const name = (fn as Record<string, unknown>).name - return typeof name === "string" ? name : undefined + if (typeof tool === "string") return tool + if (!tool || typeof tool !== "object") return undefined + const obj = tool as Record<string, unknown> + if (typeof obj.name === "string") return obj.name + const fn = obj.function + if (!fn || typeof fn !== "object") return undefined + const name = (fn as Record<string, unknown>).name + return typeof name === "string" ? name : undefined }

Also applies to: 104-105, 108-110

mmabrouk · 2026-06-19T16:29:41Z

Superseded. Replacing the path-based stack with PRs sliced by functional area showing final code only, so reviewers don't comment on intermediate scaffolding that a later PR rewrites. See the new set.

mmabrouk added 7 commits June 17, 2026 13:06

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend documentation Improvements or additions to documentation SDK labels Jun 19, 2026

mmabrouk commented Jun 19, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

mmabrouk closed this Jun 19, 2026

		"inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),
		"callRef": spec.get("callRef"),

-                "inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),
-                "callRef": spec.get("callRef"),
+def _normalize_tool_specs(specs: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+        input_schema = spec.get("inputSchema")
+        if not input_schema:
+            input_schema = {"type": "object", "properties": {}}
+        normalized.append(
+            {
+                "name": name,
+                "description": spec.get("description") or name,
+                "inputSchema": input_schema,
+                "callRef": spec.get("callRef"),
+            }
+        )

		monkeypatch.setattr(app, "select_backend", lambda selection: backend)
		monkeypatch.setattr(

Conversation

mmabrouk commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR is part of a stack. Review bottom-up.

Context

What this changes

Key architectural decision to review

How to review this PR

Tests / notes

Uh oh!

vercel Bot commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

mmabrouk commented Jun 19, 2026

Reviewer guide: interesting code

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmabrouk commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading