Skip to content

feat(agent): move the agent runtime into the SDK behind backend/harness ports#4761

Closed
mmabrouk wants to merge 7 commits into
feat/agent-rivet-acp-wp8from
feat/agent-harness-port
Closed

feat(agent): move the agent runtime into the SDK behind backend/harness ports#4761
mmabrouk wants to merge 7 commits into
feat/agent-rivet-acp-wp8from
feat/agent-harness-port

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 19, 2026

Copy link
Copy Markdown
Member

This PR is part of a stack. Review bottom-up.

Each PR's diff is only its own delta. Merge from the bottom. This PR's base is #4760 (merge that first).

Context

This PR pulls the agent runtime out of the service and into the SDK behind ports and adapters. It builds on feat/agent-rivet-acp-wp8 and lands slice #4 from docs/design/agent-workflows/pr-stack.md: it draws the durable line between the neutral agent definition, the harness-specific config, and the runtime infrastructure. Before this change, the service handler owned the engine. The 470-line services/oss/src/agent.py parsed the request, picked a harness, and called the runner all in one place, and the TypeScript runner owned the per-harness mapping. That made it hard to add an engine or run an agent without the service.

What this changes

The runtime now lives in agenta.sdk.agents as five ports. Backend is the engine, Environment owns the sandbox policy over a backend, Sandbox is where a session's process tree runs, Session is one conversation, and Harness maps a neutral SessionConfig into harness-specific config. InProcessPiBackend and RivetBackend implement Backend. PiHarness, ClaudeHarness, and AgentaHarness implement Harness.

The service stops owning the engine and only composes adapters. Before, agent.py chose a harness and drove the runner inline. Now services/oss/src/agent/app.py resolves tools and secrets, picks a backend with select_backend, then runs one turn through make_harness(...).prompt(...). The handler holds no engine knowledge.

The neutral config and the harness config are now separate types. The author edits one neutral AgentConfig (instructions, model, tools) plus a RunSelection (harness, sandbox, permission policy). Each Harness turns that into a PiAgentConfig, ClaudeAgentConfig, or AgentaAgentConfig. These differ in shape, not just identity: Pi carries built-in tool names and never gates tool use, while Claude has no built-ins, delivers tools over MCP, and carries a permission policy.

The TypeScript runner moves into role folders (engines/, tools/, tracing/, extensions/), runPi.ts becomes engines/pi.ts, and the flat ten-file src/ gets a rewritten README. A dedicated AgentConfigControl.tsx element edits the typed config in the playground and reuses the model selector, the tool picker, and the enum selects, so Composio and built-in tools are finally selectable on an agent.

Key architectural decision to review

The SDK owns the runtime ports and the service only composes adapters. Read sdks/python/agenta/sdk/agents/interfaces.py. A Backend declares supported_harnesses and stays pure plumbing: it takes an already-harness-shaped config and launches it. A Harness validates at construction that the environment's backend can drive it, and raises UnsupportedHarnessError otherwise. The per-harness knowledge that used to sit in the TypeScript runner now lives in adapters/harnesses.py, on the Python side. The tradeoff: this is more indirection than a service-owned switch, but it is what lets a future standalone SDK run an agent with no service, and it puts the "which engine can drive which harness" rule in one typed place instead of scattered string checks.

The backend is a deployment choice, the harness is editable config, and select_backend straddles them. Read services/oss/src/agent/app.py. select_backend upgrades pi/agenta to the rivet backend when the harness or a non-local sandbox needs it, so a Claude harness or a Daytona sandbox never silently drops the choice. Scrutinize the seam: the in-process backend supports {PI, AGENTA} and the rivet backend supports {PI, CLAUDE}, so agenta plus a non-local sandbox has no backend and raises rather than running the wrong thing. That gap is intentional, and it is the line most worth a second look.

How to review this PR

Review the 7 commits one at a time. The middle commits are pure renames and folder moves with no behavior change, so the diff is large but most of it is mechanical.

  1. Open commit 1 (session-shaped harness/runtime port) first. It introduces the session-shaped ports and the shared wire contract. This is the conceptual core.
  2. Skim commits 2 and 3 (the agent/harness split and the TypeScript role-folder regroup). These are moves and renames. Read the new file boundaries, not every moved line.
  3. Read commit 4 (dedicated agent-config playground element) for the typed config and the new control, and commit 5 (relay Pi tool calls through the runner on Daytona) for the one real behavior fix in the runner.
  4. Skip commit 6 (docs restructure) unless you want the design narrative.
  5. Read commit 7 (move the agent runtime into the SDK) last. It relocates the runtime to agenta.sdk.agents, rewires the service onto the ports, deletes services/oss/src/harness, and adds the SDK and golden tests. This is where the final shape lands.

Then read the code in this order: dtos.py and interfaces.py for the contracts, adapters/in_process.py and adapters/rivet.py for the two backends, adapters/harnesses.py for the neutral-to-harness mapping, services/oss/src/agent/app.py for the composition, and AgentConfigControl.tsx for the playground.

The regression most likely to break: select_backend routing. An old revision that sends only the flat params, or an agenta harness paired with a non-local sandbox, must keep resolving to the same backend it did before. RunSelection.from_params and the fall-back to the old shape are what hold the existing revisions.

Tests / notes

The PR adds SDK unit tests (test_harness_adapters.py, test_environment_lifecycle.py, test_dtos_*), a wire-contract test, and golden /run request/result fixtures under sdks/python/oss/tests/pytest/unit/agents/. The commit messages record live verification across pi, rivet+pi+local, rivet+claude+local, and rivet+pi+daytona. LocalBackend is still a stub and raises NotImplementedError; AgentaHarness does not yet run on rivet or Daytona. Both are known and tracked in ground-truth.md.

mmabrouk added 7 commits June 17, 2026 13:06
Evolve the agent service ports toward the rivet sandbox-agent session shape, so the
rivet (ACP) and legacy in-process Pi backends share one clean, capability-aware port.

- ports.py: Environment + Harness seams, a first-class AgentSession (create/prompt/
  destroy), HarnessCapabilities, ContentBlock, Message, AgentEvent, structured AgentResult.
- harness.py: SubprocessHarness + HttpHarness share one wire contract (wire.py),
  replacing the pi_harness/pi_http_harness/rivet_harness trio. The engine is an env value.
- TS: shared protocol.ts; runPi/runRivet return the enriched result; runRivet probes
  getAgent() capabilities and routes tools by mcpTools, not the harness name; usage flows
  on the rivet path (split from PromptResponse.usage); one shared toolClient.ts replaces
  the triplicated /tools/call client.
- agent.py uses the session API; _select_backend upgrades pi/local to rivet when the
  selected harness/sandbox needs it. permission_policy added to /inspect.

Verified live: pi, rivet+pi+local, rivet+claude+local, rivet+pi+daytona; playground run
succeeds with usage; invoke_agent nests under the /invoke span. Design notes under
docs/design/agent-workflows/harness-port-redesign/.
…ss runtime

Address the god-module and the misleading package name:

- services/oss/src/harness/ (was agent_pi/): the engine-agnostic runtime — ports.py,
  transports.py (was harness.py), environment.py, wire.py. Named for the seam, not Pi;
  harness choice (pi/claude) lives inside the runtime, so there is no agent_claude.
- services/oss/src/agent/ (was the 470-line agent.py god-module): the Agenta workflow app
  — app.py (thin handler + backend wiring), inputs.py (request parsing), tools.py,
  secrets.py, tracing.py, client.py (shared backend access), schemas.py, config.py.

No behavior change. Verified live: a playground run answers 'REFACTOR-OK Lisbon' with usage.
…ite README

The TS runner's src/ had grown one work package at a time into a flat folder of ten
files with no signal of role. Group them and rewrite the stale README (it still called
this a 'Pi wrapper' and pointed at the moved agent.py):

  src/cli.ts, server.ts, protocol.ts   entrypoints + the wire contract
  src/engines/{pi,rivet}.ts            the two engines (was runPi.ts / runRivet.ts)
  src/tracing/otel.ts                  the tracers (was agenta-otel.ts)
  src/tools/{client,mcp-bridge,mcp-server}.ts   tool delivery (was toolClient/toolBridge*)
  src/extensions/agenta.ts             the Pi extension (was piExtension.ts)

No behavior change. Updated the fragile __dirname-relative paths in engines/rivet.ts
(PKG_ROOT) and tools/mcp-bridge.ts (the tsx bin + server path) for the new depth, and the
build-extension entry. Verified live: rivet+pi+local through the restarted sidecar answers
'Athens' with usage; tsc --strict and the extension build pass.
Replace the loose model/agents_md/harness/sandbox params with one `agent`
config element (x-ag-type: agent_config) carrying instructions, model, tools,
harness, sandbox, and permission policy. The playground renders it through a new
AgentConfigControl that reuses the existing controls: the model selector, the
tool picker (so Composio and builtin tools are finally selectable on the agent),
the enum selects, and a textarea. The backend reads it via resolve_agent_config
and falls back to the old shape so existing revisions keep running.

Verified live: the element renders with the tool picker, and a GitHub Composio
tool runs end to end on pi+local.
Tools worked locally but failed on Daytona. The in-sandbox Pi extension POSTed
each tool call to Agenta's /tools/call, but a firewalled or private backend does
not expose that to the remote cloud sandbox (the same reason tracing is built
from the event stream on Daytona rather than in-sandbox OTLP). The sandbox has
internet but cannot reach the backend, so the call failed and the model gave up.

Route the call through the runner, which can reach Agenta. The extension writes
the request to a file in a sandbox dir and polls for the response; the runner
watches the dir over the daemon filesystem API, calls /tools/call, and writes the
result back (tools/relay.ts). Local runs keep the direct path.

Verified programmatically: rivet+pi+daytona with a GitHub Composio tool now
returns the real login (was 'the tool failed twice'); local is unchanged.
Move the raw work-package material (wp-1..wp-8, harness-port-redesign,
research) into scratch/ and add clean top-level pages a reviewer can read
top to bottom: a README index, architecture, ports-and-adapters, sessions,
and adapters/{pi,claude-code}. Update the three in-code references that
pointed at the moved doc paths.
…ss ports

Relocate the neutral runtime to agenta.sdk.agents (dtos / interfaces / adapters /
utils): the Backend / Environment / Sandbox / Session / Harness ports, the
RivetBackend / InProcessPiBackend / LocalBackend backends, the Pi / Claude / Agenta
harness adapters (which own the per-harness config mapping), and the /run wire.
Rewire the agent service onto the ports and delete services/oss/src/harness. Add
SDK + service unit and golden tests, and update the agent-workflows docs to the
as-built design.
@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 19, 2026 3:40pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b297c850-ffdd-4e4b-b267-a0ba26ceb7ec

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-harness-port

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend documentation Improvements or additions to documentation SDK labels Jun 19, 2026
@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

The interesting decisions, by file and line:

  • sdks/python/agenta/sdk/agents/interfaces.py:210 — a Harness validates against Backend.supported_harnesses at construction and raises UnsupportedHarnessError, so "which engine can drive which harness" is one typed check, not scattered string compares.
  • sdks/python/agenta/sdk/agents/interfaces.py:279 — the cold stream path carries the session id forward via on_result and tears the session down via on_cleanup, so a drained, broken, or cancelled stream still cleans up.
  • sdks/python/agenta/sdk/agents/adapters/in_process.py:118 — the in-process backend supports {PI, AGENTA} while rivet supports {PI, CLAUDE}; this split is the real constraint select_backend reads, and it is why agenta + non-local has no backend.
  • sdks/python/agenta/sdk/agents/adapters/harnesses.py:96ClaudeHarness drops Pi built-in tools (with a warning) and carries the permission policy, while PiHarness keeps built-ins and never gates; the per-harness mapping that used to live in the TypeScript runner now lives here.
  • services/oss/src/agent/app.py:64select_backend upgrades pi/agenta to rivet when the harness or sandbox needs it; the handler holds no engine knowledge and only composes a Harness over an Environment.
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx — one composite control dispatched from x-ag-type: "agent_config" reuses the existing model selector, tool picker, and enum selects, so tools are stored as the same tool-object shape the backend resolver already parses.

harness_type: ClassVar[HarnessType]

def __init__(self, environment: Environment) -> None:
if not environment.backend.supports(self.harness_type):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This construction-time check is the heart of the design: the rule for which engine can drive which harness lives in one typed place (Backend.supported_harnesses), not in scattered string compares across the service. A misconfigured harness/backend pair fails here, before any run.

cwd = str(wrapper_dir())
use_rivet = (
runtime == "rivet"
or selection.harness not in ("pi", "agenta")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OR is the seam to scrutinize. select_backend upgrades pi/agenta to rivet when the harness or sandbox needs it. Note the asymmetry it creates: in-process supports {PI, AGENTA} and rivet supports {PI, CLAUDE}, so an agenta harness plus a non-local sandbox lands on rivet, which cannot drive agenta, and raises UnsupportedHarnessError. That gap is intentional but worth confirming against the old behavior.

# carried through.
if config.builtin_tools:
log.warning(
"ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept",

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude drops Pi built-in tools rather than ship a name it cannot honor. This is the clearest case of why PiAgentConfig and ClaudeAgentConfig differ in shape, not just identity: built-ins are a Pi concept and Claude delivers tools over MCP. Confirm a Claude agent configured with built-in tools degrades sanely rather than silently dropping intended behavior.

if result.session_id:
config.session_id = result.session_id

return session.stream(messages).on_result(_absorb).on_cleanup(session.destroy)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cold stream path carries the session id forward (on_result) and destroys the session on stream end (on_cleanup), covering drain, break, and cancellation in one place. Worth checking the AgentRun cleanup hook actually fires on an early break.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (2)
services/agent/src/engines/pi.ts (1)

96-101: ⚖️ Poor tradeoff

Global process.env mutation may cause races in concurrent server use.

applySecrets modifies process.env globally, which could cause secret leakage between concurrent requests if runPi is called multiple times in parallel on the server path. The current architecture mitigates this (rivet is the platform default, and the subprocess transport runs one request per process), but this constraint should be documented or enforced.

Consider either:

  1. Documenting that runPi must not be called concurrently in a single process, or
  2. Passing secrets via a child process environment rather than mutating the current process
services/oss/tests/pytest/unit/agent/test_secrets_mapping.py (1)

13-24: ⚡ Quick win

Add a behavior test for resolve_harness_secrets() response shapes.

These assertions protect constants, but they won’t catch regressions in JSON payload parsing (list vs envelope). Add one mocked-response test for the resolver path to validate real extraction behavior.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b749d53-e6c2-4f7d-b64c-c81a60ee683e

📥 Commits

Reviewing files that changed from the base of the PR and between 9c3d141 and 8b00633.

⛔ Files ignored due to path filters (1)
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (126)
  • docs/design/agent-workflows/README.md
  • docs/design/agent-workflows/adapters/agenta.md
  • docs/design/agent-workflows/adapters/claude-code.md
  • docs/design/agent-workflows/adapters/pi.md
  • docs/design/agent-workflows/agent-protocol-rfc.md
  • docs/design/agent-workflows/architecture.md
  • docs/design/agent-workflows/ports-and-adapters.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/README.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/implementation.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/plan.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/proposal.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/research.md
  • docs/design/agent-workflows/scratch/harness-port-redesign/status.md
  • docs/design/agent-workflows/scratch/research/auth-secrets.md
  • docs/design/agent-workflows/scratch/research/daytona-sandbox.md
  • docs/design/agent-workflows/scratch/research/diskless-in-memory-config.md
  • docs/design/agent-workflows/scratch/research/open-questions.md
  • docs/design/agent-workflows/scratch/research/otel-instrumentation.md
  • docs/design/agent-workflows/scratch/research/pi-interaction.md
  • docs/design/agent-workflows/scratch/research/sandbox-sharing.md
  • docs/design/agent-workflows/scratch/sdk-local-backend/status.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/README.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/.env.example
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.md
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/agenta-otel.ts
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/package.json
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.ts
  • docs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.md
  • docs/design/agent-workflows/scratch/wp-2-agent-service/README.md
  • docs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.md
  • docs/design/agent-workflows/scratch/wp-2-agent-service/qa.md
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/README.md
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/README.md
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.py
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/build_snapshot.py
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/cleanup.py
  • docs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.py
  • docs/design/agent-workflows/scratch/wp-4-multi-message-output/README.md
  • docs/design/agent-workflows/scratch/wp-5-chat-vs-completion/README.md
  • docs/design/agent-workflows/scratch/wp-6-workflow-type-and-template/README.md
  • docs/design/agent-workflows/scratch/wp-7-tools/README.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/README.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/architecture.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/context.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/isolation-and-fork.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/plan.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.py
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/debug-events.ts
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/dump-full.ts
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/package.json
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/spike.ts
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/research.md
  • docs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.md
  • docs/design/agent-workflows/sessions.md
  • docs/design/agent-workflows/streaming-and-sessions.md
  • sdks/python/agenta/__init__.py
  • sdks/python/agenta/sdk/agents/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
  • sdks/python/agenta/sdk/agents/adapters/harnesses.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/local.py
  • sdks/python/agenta/sdk/agents/adapters/rivet.py
  • sdks/python/agenta/sdk/agents/dtos.py
  • sdks/python/agenta/sdk/agents/errors.py
  • sdks/python/agenta/sdk/agents/interfaces.py
  • sdks/python/agenta/sdk/agents/streaming.py
  • sdks/python/agenta/sdk/agents/utils/__init__.py
  • sdks/python/agenta/sdk/agents/utils/ts_runner.py
  • sdks/python/agenta/sdk/agents/utils/wire.py
  • sdks/python/agenta/tests/agents/test_streaming.py
  • sdks/python/oss/tests/pytest/unit/agents/__init__.py
  • sdks/python/oss/tests/pytest/unit/agents/conftest.py
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
  • sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
  • services/agent/README.md
  • services/agent/scripts/build-extension.mjs
  • services/agent/skills/agenta-getting-started/SKILL.md
  • services/agent/src/cli.ts
  • services/agent/src/engines/pi.ts
  • services/agent/src/engines/rivet.ts
  • services/agent/src/extensions/agenta.ts
  • services/agent/src/protocol.ts
  • services/agent/src/runPi.ts
  • services/agent/src/server.ts
  • services/agent/src/tools/client.ts
  • services/agent/src/tools/mcp-bridge.ts
  • services/agent/src/tools/mcp-server.ts
  • services/agent/src/tools/relay.ts
  • services/agent/src/tracing/otel.ts
  • services/agent/test/stream-events.test.ts
  • services/oss/src/agent.py
  • services/oss/src/agent/__init__.py
  • services/oss/src/agent/app.py
  • services/oss/src/agent/client.py
  • services/oss/src/agent/config.py
  • services/oss/src/agent/schemas.py
  • services/oss/src/agent/secrets.py
  • services/oss/src/agent/tools.py
  • services/oss/src/agent/tracing.py
  • services/oss/src/agent_pi/__init__.py
  • services/oss/src/agent_pi/local_runtime.py
  • services/oss/src/agent_pi/pi_harness.py
  • services/oss/src/agent_pi/pi_http_harness.py
  • services/oss/src/agent_pi/ports.py
  • services/oss/src/agent_pi/rivet_harness.py
  • services/oss/src/agent_pi/schemas.py
  • services/oss/tests/pytest/unit/agent/__init__.py
  • services/oss/tests/pytest/unit/agent/conftest.py
  • services/oss/tests/pytest/unit/agent/test_invoke_handler.py
  • services/oss/tests/pytest/unit/agent/test_secrets_mapping.py
  • services/oss/tests/pytest/unit/agent/test_select_backend.py
  • services/oss/tests/pytest/unit/agent/test_tool_refs.py
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SchemaPropertyRenderer.tsx
💤 Files with no reviewable changes (9)
  • services/oss/src/agent_pi/local_runtime.py
  • services/oss/src/agent_pi/schemas.py
  • services/oss/src/agent_pi/rivet_harness.py
  • services/oss/src/agent_pi/init.py
  • services/oss/src/agent_pi/pi_http_harness.py
  • services/oss/src/agent_pi/pi_harness.py
  • services/agent/src/runPi.ts
  • services/oss/src/agent_pi/ports.py
  • services/oss/src/agent.py

Comment on lines +31 to +33
- `adapters/in_process.py` — `InProcessPiBackend` (engine hard-coded `pi`; pi only, local
only; the reference backend) + its sandbox/session.
- `adapters/local.py` — `LocalBackend`, STUB (raises `NotImplementedError`).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the InProcessPiBackend harness support note.

Line 31 currently says in-process is “pi only,” but the current design notes in-process supports {PI, AGENTA}. Keeping this stale can misdirect backend-routing follow-up work.

Based on learnings from the provided PR context: backend capability split is described as in-process {PI, AGENTA} and rivet {PI, CLAUDE}.

Comment on lines +69 to +70
"inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),
"callRef": spec.get("callRef"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid shared mutable default inputSchema objects.

On Line 69, dict(_EMPTY_OBJECT_SCHEMA) only shallow-copies, so nested properties can be shared across tools. A later mutation can leak schema state between entries.

Suggested fix
 def _normalize_tool_specs(specs: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
@@
-        normalized.append(
+        input_schema = spec.get("inputSchema")
+        if not input_schema:
+            input_schema = {"type": "object", "properties": {}}
+        normalized.append(
             {
                 "name": name,
                 "description": spec.get("description") or name,
-                "inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),
+                "inputSchema": input_schema,
                 "callRef": spec.get("callRef"),
             }
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),
"callRef": spec.get("callRef"),
def _normalize_tool_specs(specs: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
input_schema = spec.get("inputSchema")
if not input_schema:
input_schema = {"type": "object", "properties": {}}
normalized.append(
{
"name": name,
"description": spec.get("description") or name,
"inputSchema": input_schema,
"callRef": spec.get("callRef"),
}
)

Comment on lines +53 to +67
def __init__(
self,
backend: "InProcessPiBackend",
config: HarnessAgentConfig,
*,
secrets: Optional[Mapping[str, str]],
trace: Optional[TraceContext],
session_id: Optional[str],
) -> None:
self._backend = backend
self._config = config
self._secrets = dict(secrets or {})
self._trace = trace
self._session_id = session_id

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

InProcessPiSession drops the selected harness and always serializes PI.

create_session(..., harness=...) (Line 142) never stores/passes harness, and _wire_payload hard-codes HarnessType.PI (Line 76). For AGENTA runs, this sends the wrong harness over /run and can bypass harness-specific behavior.

Proposed fix
 class InProcessPiSession(Session):
@@
     def __init__(
         self,
         backend: "InProcessPiBackend",
         config: HarnessAgentConfig,
         *,
+        harness: HarnessType,
         secrets: Optional[Mapping[str, str]],
         trace: Optional[TraceContext],
         session_id: Optional[str],
     ) -> None:
         self._backend = backend
         self._config = config
+        self._harness = harness
         self._secrets = dict(secrets or {})
         self._trace = trace
         self._session_id = session_id
@@
         return request_to_wire(
             engine=InProcessPiBackend._ENGINE,
-            harness=HarnessType.PI,
+            harness=self._harness,
             sandbox="local",
             config=self._config,
             messages=messages,
             secrets=self._secrets,
             trace=self._trace,
             session_id=self._session_id,
         )
@@
     async def create_session(
@@
         return InProcessPiSession(
             self,
             config,
+            harness=harness,
             secrets=secrets,
             trace=trace,
             session_id=session_id,
         )

Also applies to: 75-77, 137-153

Comment on lines +27 to +48
supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})

async def create_sandbox(self) -> Sandbox:
raise NotImplementedError(
"LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
"Phase 4: Claude via claude-agent-sdk)."
)

async def create_session(
self,
sandbox: Sandbox,
config: HarnessAgentConfig,
*,
harness: HarnessType,
secrets: Optional[Mapping[str, str]] = None,
trace: Optional[TraceContext] = None,
session_id: Optional[str] = None,
) -> Session:
raise NotImplementedError(
"LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
"Phase 4: Claude via claude-agent-sdk)."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

supported_harnesses currently over-promises for an unimplemented backend.

Line 27 advertises PI/CLAUDE support, but both lifecycle methods always raise. Any routing/validation that trusts supports() can accept this backend and fail later at runtime.

Proposed fix
 class LocalBackend(Backend):
@@
-    supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+    # Keep empty until lifecycle paths are implemented, so compatibility checks fail early.
+    supported_harnesses = frozenset()

Comment on lines +532 to +537
if isinstance(agent, dict):
return (
agent.get("instructions") or defaults.instructions,
agent.get("model") or defaults.model,
agent.get("tools"),
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve defaults.tools when agent.tools is omitted.

On Line 536, the agent-shape branch returns agent.get("tools") directly; when absent, AgentConfig.from_params() turns it into [] instead of falling back to defaults.tools, which contradicts the documented unset-field fallback behavior and can silently disable configured tools.

💡 Proposed fix
 def _parse_agent_fields(
     params: Dict[str, Any],
     defaults: AgentConfig,
 ) -> Tuple[Optional[str], Optional[str], Any]:
@@
     agent = params.get("agent")
     if isinstance(agent, dict):
+        raw_tools = agent.get("tools")
+        if raw_tools is None:
+            raw_tools = defaults.tools
         return (
             agent.get("instructions") or defaults.instructions,
             agent.get("model") or defaults.model,
-            agent.get("tools"),
+            raw_tools,
         )

Comment on lines +58 to +66
secrets = response.json() or []
except Exception: # pylint: disable=broad-except
log.warning("agent: vault secrets fetch failed", exc_info=True)
return {}

env: Dict[str, str] = {}
for secret in secrets:
if not isinstance(secret, dict) or secret.get("kind") != "provider_key":
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle /secrets/ envelope payloads; current parsing can silently drop all secrets.

At Line 58, response.json() is treated as an iterable secret list. If the endpoint returns an object envelope (for example { "data": [...] }), the loop at Line 64 iterates string keys and no secret is ever mapped, causing silent auth fallback.

Proposed fix
-        secrets = response.json() or []
+        payload = response.json() or []
     except Exception:  # pylint: disable=broad-except
         log.warning("agent: vault secrets fetch failed", exc_info=True)
         return {}
 
     env: Dict[str, str] = {}
+    if isinstance(payload, dict):
+        secrets = payload.get("data") or payload.get("results") or []
+    elif isinstance(payload, list):
+        secrets = payload
+    else:
+        secrets = []
+
     for secret in secrets:
         if not isinstance(secret, dict) or secret.get("kind") != "provider_key":
             continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
secrets = response.json() or []
except Exception: # pylint: disable=broad-except
log.warning("agent: vault secrets fetch failed", exc_info=True)
return {}
env: Dict[str, str] = {}
for secret in secrets:
if not isinstance(secret, dict) or secret.get("kind") != "provider_key":
continue
payload = response.json() or []
except Exception: # pylint: disable=broad-except
log.warning("agent: vault secrets fetch failed", exc_info=True)
return {}
env: Dict[str, str] = {}
if isinstance(payload, dict):
secrets = payload.get("data") or payload.get("results") or []
elif isinstance(payload, list):
secrets = payload
else:
secrets = []
for secret in secrets:
if not isinstance(secret, dict) or secret.get("kind") != "provider_key":
continue

Comment on lines +93 to +106
async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client:
response = await client.post(
f"{api_base}/tools/resolve",
json={"tools": refs},
headers=headers,
)

if response.status_code >= 400:
raise RuntimeError(
f"Tool resolution failed (HTTP {response.status_code}): {response.text[:500]}"
)

data = response.json()
builtins = data.get("builtins") or []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize transport/JSON failures into the same resolver error path.

Line 94 and Line 105 can raise httpx/JSON decode exceptions that bypass your explicit RuntimeError handling for failed resolution, producing inconsistent invoke failures.

Proposed fix
-    async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client:
-        response = await client.post(
-            f"{api_base}/tools/resolve",
-            json={"tools": refs},
-            headers=headers,
-        )
+    try:
+        async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client:
+            response = await client.post(
+                f"{api_base}/tools/resolve",
+                json={"tools": refs},
+                headers=headers,
+            )
+    except httpx.HTTPError as exc:
+        raise RuntimeError(f"Tool resolution request failed: {exc}") from exc
@@
-    data = response.json()
+    try:
+        data = response.json()
+    except ValueError as exc:
+        raise RuntimeError("Tool resolution failed: backend returned invalid JSON") from exc

Comment on lines +70 to +71
if not usage or not usage.get("total"):
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t treat total=0 as absent usage.

Line 70 drops valid zero-token usage because 0 is falsy, so span usage attributes are never recorded for those runs.

Proposed fix
-    if not usage or not usage.get("total"):
+    if not usage or "total" not in usage:
         return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not usage or not usage.get("total"):
return
if not usage or "total" not in usage:
return

Comment on lines +34 to +35
monkeypatch.setattr(app, "select_backend", lambda selection: backend)
monkeypatch.setattr(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cross-harness test can pass without validating harness routing.

Because the stub at Line 34 ignores selection, the equality assertion at Line 71 stays green even if harness parsing regresses to a single default route. Capture and assert the selected harness values to lock the intended behavior.

Suggested tightening
 `@pytest.fixture`
 def patched(monkeypatch, fake_backend):
     backend = fake_backend(result=AgentResult(output="echo", usage={"total": 15}))
     recorded = {}
+    selected = []
@@
-    monkeypatch.setattr(app, "select_backend", lambda selection: backend)
+    def _select_backend(selection):
+        selected.append(selection.harness)
+        return backend
+    monkeypatch.setattr(app, "select_backend", _select_backend)
@@
-    return backend, recorded
+    return backend, recorded, selected
@@
 async def test_invoke_body_is_identical_across_harnesses(patched):
@@
-    pi = await _invoke("pi")
+    _, _, selected = patched
+    pi = await _invoke("pi")
     agenta = await _invoke("agenta")
     claude = await _invoke("claude")
     assert pi == agenta == claude
+    assert selected == ["pi", "agenta", "claude"]

Also applies to: 65-71

Comment on lines +36 to +42
function toolName(tool: unknown): string | undefined {
if (!tool || typeof tool !== "object") return undefined
const fn = (tool as Record<string, unknown>).function
if (!fn || typeof fn !== "object") return undefined
const name = (fn as Record<string, unknown>).name
return typeof name === "string" ? name : undefined
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Support legacy tool object shapes when deriving tool identity.

toolName() currently ignores bare-string and {name: ...} entries, but those shapes are still valid in resolver normalization. That breaks selectedToolNames/remove-by-name behavior for existing configs (duplicates can be re-added and removal can miss items).

Proposed fix
 function toolName(tool: unknown): string | undefined {
-    if (!tool || typeof tool !== "object") return undefined
-    const fn = (tool as Record<string, unknown>).function
-    if (!fn || typeof fn !== "object") return undefined
-    const name = (fn as Record<string, unknown>).name
-    return typeof name === "string" ? name : undefined
+    if (typeof tool === "string") return tool
+    if (!tool || typeof tool !== "object") return undefined
+    const obj = tool as Record<string, unknown>
+    if (typeof obj.name === "string") return obj.name
+    const fn = obj.function
+    if (!fn || typeof fn !== "object") return undefined
+    const name = (fn as Record<string, unknown>).name
+    return typeof name === "string" ? name : undefined
 }

Also applies to: 104-105, 108-110

@mmabrouk

Copy link
Copy Markdown
Member Author

Superseded. Replacing the path-based stack with PRs sliced by functional area showing final code only, so reviewers don't comment on intermediate scaffolding that a later PR rewrites. See the new set.

@mmabrouk mmabrouk closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend documentation Improvements or additions to documentation SDK size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant