feat(agent): move the agent runtime into the SDK behind backend/harness ports#4761
feat(agent): move the agent runtime into the SDK behind backend/harness ports#4761mmabrouk wants to merge 7 commits into
Conversation
Evolve the agent service ports toward the rivet sandbox-agent session shape, so the rivet (ACP) and legacy in-process Pi backends share one clean, capability-aware port. - ports.py: Environment + Harness seams, a first-class AgentSession (create/prompt/ destroy), HarnessCapabilities, ContentBlock, Message, AgentEvent, structured AgentResult. - harness.py: SubprocessHarness + HttpHarness share one wire contract (wire.py), replacing the pi_harness/pi_http_harness/rivet_harness trio. The engine is an env value. - TS: shared protocol.ts; runPi/runRivet return the enriched result; runRivet probes getAgent() capabilities and routes tools by mcpTools, not the harness name; usage flows on the rivet path (split from PromptResponse.usage); one shared toolClient.ts replaces the triplicated /tools/call client. - agent.py uses the session API; _select_backend upgrades pi/local to rivet when the selected harness/sandbox needs it. permission_policy added to /inspect. Verified live: pi, rivet+pi+local, rivet+claude+local, rivet+pi+daytona; playground run succeeds with usage; invoke_agent nests under the /invoke span. Design notes under docs/design/agent-workflows/harness-port-redesign/.
…ss runtime Address the god-module and the misleading package name: - services/oss/src/harness/ (was agent_pi/): the engine-agnostic runtime — ports.py, transports.py (was harness.py), environment.py, wire.py. Named for the seam, not Pi; harness choice (pi/claude) lives inside the runtime, so there is no agent_claude. - services/oss/src/agent/ (was the 470-line agent.py god-module): the Agenta workflow app — app.py (thin handler + backend wiring), inputs.py (request parsing), tools.py, secrets.py, tracing.py, client.py (shared backend access), schemas.py, config.py. No behavior change. Verified live: a playground run answers 'REFACTOR-OK Lisbon' with usage.
…ite README
The TS runner's src/ had grown one work package at a time into a flat folder of ten
files with no signal of role. Group them and rewrite the stale README (it still called
this a 'Pi wrapper' and pointed at the moved agent.py):
src/cli.ts, server.ts, protocol.ts entrypoints + the wire contract
src/engines/{pi,rivet}.ts the two engines (was runPi.ts / runRivet.ts)
src/tracing/otel.ts the tracers (was agenta-otel.ts)
src/tools/{client,mcp-bridge,mcp-server}.ts tool delivery (was toolClient/toolBridge*)
src/extensions/agenta.ts the Pi extension (was piExtension.ts)
No behavior change. Updated the fragile __dirname-relative paths in engines/rivet.ts
(PKG_ROOT) and tools/mcp-bridge.ts (the tsx bin + server path) for the new depth, and the
build-extension entry. Verified live: rivet+pi+local through the restarted sidecar answers
'Athens' with usage; tsc --strict and the extension build pass.
Replace the loose model/agents_md/harness/sandbox params with one `agent` config element (x-ag-type: agent_config) carrying instructions, model, tools, harness, sandbox, and permission policy. The playground renders it through a new AgentConfigControl that reuses the existing controls: the model selector, the tool picker (so Composio and builtin tools are finally selectable on the agent), the enum selects, and a textarea. The backend reads it via resolve_agent_config and falls back to the old shape so existing revisions keep running. Verified live: the element renders with the tool picker, and a GitHub Composio tool runs end to end on pi+local.
Tools worked locally but failed on Daytona. The in-sandbox Pi extension POSTed each tool call to Agenta's /tools/call, but a firewalled or private backend does not expose that to the remote cloud sandbox (the same reason tracing is built from the event stream on Daytona rather than in-sandbox OTLP). The sandbox has internet but cannot reach the backend, so the call failed and the model gave up. Route the call through the runner, which can reach Agenta. The extension writes the request to a file in a sandbox dir and polls for the response; the runner watches the dir over the daemon filesystem API, calls /tools/call, and writes the result back (tools/relay.ts). Local runs keep the direct path. Verified programmatically: rivet+pi+daytona with a GitHub Composio tool now returns the real login (was 'the tool failed twice'); local is unchanged.
Move the raw work-package material (wp-1..wp-8, harness-port-redesign,
research) into scratch/ and add clean top-level pages a reviewer can read
top to bottom: a README index, architecture, ports-and-adapters, sessions,
and adapters/{pi,claude-code}. Update the three in-code references that
pointed at the moved doc paths.
…ss ports Relocate the neutral runtime to agenta.sdk.agents (dtos / interfaces / adapters / utils): the Backend / Environment / Sandbox / Session / Harness ports, the RivetBackend / InProcessPiBackend / LocalBackend backends, the Pi / Claude / Agenta harness adapters (which own the per-harness config mapping), and the /run wire. Rewire the agent service onto the ports and delete services/oss/src/harness. Add SDK + service unit and golden tests, and update the agent-workflows docs to the as-built design.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: interesting codeThe interesting decisions, by file and line:
|
| harness_type: ClassVar[HarnessType] | ||
|
|
||
| def __init__(self, environment: Environment) -> None: | ||
| if not environment.backend.supports(self.harness_type): |
There was a problem hiding this comment.
This construction-time check is the heart of the design: the rule for which engine can drive which harness lives in one typed place (Backend.supported_harnesses), not in scattered string compares across the service. A misconfigured harness/backend pair fails here, before any run.
| cwd = str(wrapper_dir()) | ||
| use_rivet = ( | ||
| runtime == "rivet" | ||
| or selection.harness not in ("pi", "agenta") |
There was a problem hiding this comment.
This OR is the seam to scrutinize. select_backend upgrades pi/agenta to rivet when the harness or sandbox needs it. Note the asymmetry it creates: in-process supports {PI, AGENTA} and rivet supports {PI, CLAUDE}, so an agenta harness plus a non-local sandbox lands on rivet, which cannot drive agenta, and raises UnsupportedHarnessError. That gap is intentional but worth confirming against the old behavior.
| # carried through. | ||
| if config.builtin_tools: | ||
| log.warning( | ||
| "ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept", |
There was a problem hiding this comment.
Claude drops Pi built-in tools rather than ship a name it cannot honor. This is the clearest case of why PiAgentConfig and ClaudeAgentConfig differ in shape, not just identity: built-ins are a Pi concept and Claude delivers tools over MCP. Confirm a Claude agent configured with built-in tools degrades sanely rather than silently dropping intended behavior.
| if result.session_id: | ||
| config.session_id = result.session_id | ||
|
|
||
| return session.stream(messages).on_result(_absorb).on_cleanup(session.destroy) |
There was a problem hiding this comment.
The cold stream path carries the session id forward (on_result) and destroys the session on stream end (on_cleanup), covering drain, break, and cancellation in one place. Worth checking the AgentRun cleanup hook actually fires on an early break.
There was a problem hiding this comment.
Actionable comments posted: 15
🧹 Nitpick comments (2)
services/agent/src/engines/pi.ts (1)
96-101: ⚖️ Poor tradeoffGlobal
process.envmutation may cause races in concurrent server use.
applySecretsmodifiesprocess.envglobally, which could cause secret leakage between concurrent requests ifrunPiis called multiple times in parallel on the server path. The current architecture mitigates this (rivet is the platform default, and the subprocess transport runs one request per process), but this constraint should be documented or enforced.Consider either:
- Documenting that
runPimust not be called concurrently in a single process, or- Passing secrets via a child process environment rather than mutating the current process
services/oss/tests/pytest/unit/agent/test_secrets_mapping.py (1)
13-24: ⚡ Quick winAdd a behavior test for
resolve_harness_secrets()response shapes.These assertions protect constants, but they won’t catch regressions in JSON payload parsing (list vs envelope). Add one mocked-response test for the resolver path to validate real extraction behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 7b749d53-e6c2-4f7d-b64c-c81a60ee683e
⛔ Files ignored due to path filters (1)
docs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (126)
docs/design/agent-workflows/README.mddocs/design/agent-workflows/adapters/agenta.mddocs/design/agent-workflows/adapters/claude-code.mddocs/design/agent-workflows/adapters/pi.mddocs/design/agent-workflows/agent-protocol-rfc.mddocs/design/agent-workflows/architecture.mddocs/design/agent-workflows/ports-and-adapters.mddocs/design/agent-workflows/scratch/harness-port-redesign/README.mddocs/design/agent-workflows/scratch/harness-port-redesign/implementation.mddocs/design/agent-workflows/scratch/harness-port-redesign/plan.mddocs/design/agent-workflows/scratch/harness-port-redesign/proposal.mddocs/design/agent-workflows/scratch/harness-port-redesign/research.mddocs/design/agent-workflows/scratch/harness-port-redesign/status.mddocs/design/agent-workflows/scratch/research/auth-secrets.mddocs/design/agent-workflows/scratch/research/daytona-sandbox.mddocs/design/agent-workflows/scratch/research/diskless-in-memory-config.mddocs/design/agent-workflows/scratch/research/open-questions.mddocs/design/agent-workflows/scratch/research/otel-instrumentation.mddocs/design/agent-workflows/scratch/research/pi-interaction.mddocs/design/agent-workflows/scratch/research/sandbox-sharing.mddocs/design/agent-workflows/scratch/sdk-local-backend/status.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/README.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/integrating-the-tracing-extension.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/.env.exampledocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/README.mddocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/agenta-otel.tsdocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/package.jsondocs/design/agent-workflows/scratch/wp-1-pi-tracing/poc/run.tsdocs/design/agent-workflows/scratch/wp-1-pi-tracing/tracing-in-the-agent-service.mddocs/design/agent-workflows/scratch/wp-2-agent-service/README.mddocs/design/agent-workflows/scratch/wp-2-agent-service/implementation-plan.mddocs/design/agent-workflows/scratch/wp-2-agent-service/qa.mddocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/README.mddocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/README.mddocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/bench_coldstart.pydocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/build_snapshot.pydocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/cleanup.pydocs/design/agent-workflows/scratch/wp-3-daytona-sandbox/poc/run_agent.pydocs/design/agent-workflows/scratch/wp-4-multi-message-output/README.mddocs/design/agent-workflows/scratch/wp-5-chat-vs-completion/README.mddocs/design/agent-workflows/scratch/wp-6-workflow-type-and-template/README.mddocs/design/agent-workflows/scratch/wp-7-tools/README.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/README.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/architecture.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/context.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/isolation-and-fork.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/plan.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.pydocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/commit_agent_config.pydocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/debug-events.tsdocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/dump-full.tsdocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/package.jsondocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/poc/spike.tsdocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/research.mddocs/design/agent-workflows/scratch/wp-8-rivet-acp-runtime/status.mddocs/design/agent-workflows/sessions.mddocs/design/agent-workflows/streaming-and-sessions.mdsdks/python/agenta/__init__.pysdks/python/agenta/sdk/agents/__init__.pysdks/python/agenta/sdk/agents/adapters/__init__.pysdks/python/agenta/sdk/agents/adapters/agenta_builtins.pysdks/python/agenta/sdk/agents/adapters/harnesses.pysdks/python/agenta/sdk/agents/adapters/in_process.pysdks/python/agenta/sdk/agents/adapters/local.pysdks/python/agenta/sdk/agents/adapters/rivet.pysdks/python/agenta/sdk/agents/dtos.pysdks/python/agenta/sdk/agents/errors.pysdks/python/agenta/sdk/agents/interfaces.pysdks/python/agenta/sdk/agents/streaming.pysdks/python/agenta/sdk/agents/utils/__init__.pysdks/python/agenta/sdk/agents/utils/ts_runner.pysdks/python/agenta/sdk/agents/utils/wire.pysdks/python/agenta/tests/agents/test_streaming.pysdks/python/oss/tests/pytest/unit/agents/__init__.pysdks/python/oss/tests/pytest/unit/agents/conftest.pysdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.jsonsdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.pysdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.pysdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.pysdks/python/oss/tests/pytest/unit/agents/test_wire_contract.pyservices/agent/README.mdservices/agent/scripts/build-extension.mjsservices/agent/skills/agenta-getting-started/SKILL.mdservices/agent/src/cli.tsservices/agent/src/engines/pi.tsservices/agent/src/engines/rivet.tsservices/agent/src/extensions/agenta.tsservices/agent/src/protocol.tsservices/agent/src/runPi.tsservices/agent/src/server.tsservices/agent/src/tools/client.tsservices/agent/src/tools/mcp-bridge.tsservices/agent/src/tools/mcp-server.tsservices/agent/src/tools/relay.tsservices/agent/src/tracing/otel.tsservices/agent/test/stream-events.test.tsservices/oss/src/agent.pyservices/oss/src/agent/__init__.pyservices/oss/src/agent/app.pyservices/oss/src/agent/client.pyservices/oss/src/agent/config.pyservices/oss/src/agent/schemas.pyservices/oss/src/agent/secrets.pyservices/oss/src/agent/tools.pyservices/oss/src/agent/tracing.pyservices/oss/src/agent_pi/__init__.pyservices/oss/src/agent_pi/local_runtime.pyservices/oss/src/agent_pi/pi_harness.pyservices/oss/src/agent_pi/pi_http_harness.pyservices/oss/src/agent_pi/ports.pyservices/oss/src/agent_pi/rivet_harness.pyservices/oss/src/agent_pi/schemas.pyservices/oss/tests/pytest/unit/agent/__init__.pyservices/oss/tests/pytest/unit/agent/conftest.pyservices/oss/tests/pytest/unit/agent/test_invoke_handler.pyservices/oss/tests/pytest/unit/agent/test_secrets_mapping.pyservices/oss/tests/pytest/unit/agent/test_select_backend.pyservices/oss/tests/pytest/unit/agent/test_tool_refs.pyweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsxweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SchemaPropertyRenderer.tsx
💤 Files with no reviewable changes (9)
- services/oss/src/agent_pi/local_runtime.py
- services/oss/src/agent_pi/schemas.py
- services/oss/src/agent_pi/rivet_harness.py
- services/oss/src/agent_pi/init.py
- services/oss/src/agent_pi/pi_http_harness.py
- services/oss/src/agent_pi/pi_harness.py
- services/agent/src/runPi.ts
- services/oss/src/agent_pi/ports.py
- services/oss/src/agent.py
| - `adapters/in_process.py` — `InProcessPiBackend` (engine hard-coded `pi`; pi only, local | ||
| only; the reference backend) + its sandbox/session. | ||
| - `adapters/local.py` — `LocalBackend`, STUB (raises `NotImplementedError`). |
There was a problem hiding this comment.
Update the InProcessPiBackend harness support note.
Line 31 currently says in-process is “pi only,” but the current design notes in-process supports {PI, AGENTA}. Keeping this stale can misdirect backend-routing follow-up work.
Based on learnings from the provided PR context: backend capability split is described as in-process {PI, AGENTA} and rivet {PI, CLAUDE}.
| "inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA), | ||
| "callRef": spec.get("callRef"), |
There was a problem hiding this comment.
Avoid shared mutable default inputSchema objects.
On Line 69, dict(_EMPTY_OBJECT_SCHEMA) only shallow-copies, so nested properties can be shared across tools. A later mutation can leak schema state between entries.
Suggested fix
def _normalize_tool_specs(specs: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
@@
- normalized.append(
+ input_schema = spec.get("inputSchema")
+ if not input_schema:
+ input_schema = {"type": "object", "properties": {}}
+ normalized.append(
{
"name": name,
"description": spec.get("description") or name,
- "inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA),
+ "inputSchema": input_schema,
"callRef": spec.get("callRef"),
}
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "inputSchema": spec.get("inputSchema") or dict(_EMPTY_OBJECT_SCHEMA), | |
| "callRef": spec.get("callRef"), | |
| def _normalize_tool_specs(specs: List[Dict[str, Any]]) -> List[Dict[str, Any]]: | |
| input_schema = spec.get("inputSchema") | |
| if not input_schema: | |
| input_schema = {"type": "object", "properties": {}} | |
| normalized.append( | |
| { | |
| "name": name, | |
| "description": spec.get("description") or name, | |
| "inputSchema": input_schema, | |
| "callRef": spec.get("callRef"), | |
| } | |
| ) |
| def __init__( | ||
| self, | ||
| backend: "InProcessPiBackend", | ||
| config: HarnessAgentConfig, | ||
| *, | ||
| secrets: Optional[Mapping[str, str]], | ||
| trace: Optional[TraceContext], | ||
| session_id: Optional[str], | ||
| ) -> None: | ||
| self._backend = backend | ||
| self._config = config | ||
| self._secrets = dict(secrets or {}) | ||
| self._trace = trace | ||
| self._session_id = session_id | ||
|
|
There was a problem hiding this comment.
InProcessPiSession drops the selected harness and always serializes PI.
create_session(..., harness=...) (Line 142) never stores/passes harness, and _wire_payload hard-codes HarnessType.PI (Line 76). For AGENTA runs, this sends the wrong harness over /run and can bypass harness-specific behavior.
Proposed fix
class InProcessPiSession(Session):
@@
def __init__(
self,
backend: "InProcessPiBackend",
config: HarnessAgentConfig,
*,
+ harness: HarnessType,
secrets: Optional[Mapping[str, str]],
trace: Optional[TraceContext],
session_id: Optional[str],
) -> None:
self._backend = backend
self._config = config
+ self._harness = harness
self._secrets = dict(secrets or {})
self._trace = trace
self._session_id = session_id
@@
return request_to_wire(
engine=InProcessPiBackend._ENGINE,
- harness=HarnessType.PI,
+ harness=self._harness,
sandbox="local",
config=self._config,
messages=messages,
secrets=self._secrets,
trace=self._trace,
session_id=self._session_id,
)
@@
async def create_session(
@@
return InProcessPiSession(
self,
config,
+ harness=harness,
secrets=secrets,
trace=trace,
session_id=session_id,
)Also applies to: 75-77, 137-153
| supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE}) | ||
|
|
||
| async def create_sandbox(self) -> Sandbox: | ||
| raise NotImplementedError( | ||
| "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, " | ||
| "Phase 4: Claude via claude-agent-sdk)." | ||
| ) | ||
|
|
||
| async def create_session( | ||
| self, | ||
| sandbox: Sandbox, | ||
| config: HarnessAgentConfig, | ||
| *, | ||
| harness: HarnessType, | ||
| secrets: Optional[Mapping[str, str]] = None, | ||
| trace: Optional[TraceContext] = None, | ||
| session_id: Optional[str] = None, | ||
| ) -> Session: | ||
| raise NotImplementedError( | ||
| "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, " | ||
| "Phase 4: Claude via claude-agent-sdk)." | ||
| ) |
There was a problem hiding this comment.
supported_harnesses currently over-promises for an unimplemented backend.
Line 27 advertises PI/CLAUDE support, but both lifecycle methods always raise. Any routing/validation that trusts supports() can accept this backend and fail later at runtime.
Proposed fix
class LocalBackend(Backend):
@@
- supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+ # Keep empty until lifecycle paths are implemented, so compatibility checks fail early.
+ supported_harnesses = frozenset()| if isinstance(agent, dict): | ||
| return ( | ||
| agent.get("instructions") or defaults.instructions, | ||
| agent.get("model") or defaults.model, | ||
| agent.get("tools"), | ||
| ) |
There was a problem hiding this comment.
Preserve defaults.tools when agent.tools is omitted.
On Line 536, the agent-shape branch returns agent.get("tools") directly; when absent, AgentConfig.from_params() turns it into [] instead of falling back to defaults.tools, which contradicts the documented unset-field fallback behavior and can silently disable configured tools.
💡 Proposed fix
def _parse_agent_fields(
params: Dict[str, Any],
defaults: AgentConfig,
) -> Tuple[Optional[str], Optional[str], Any]:
@@
agent = params.get("agent")
if isinstance(agent, dict):
+ raw_tools = agent.get("tools")
+ if raw_tools is None:
+ raw_tools = defaults.tools
return (
agent.get("instructions") or defaults.instructions,
agent.get("model") or defaults.model,
- agent.get("tools"),
+ raw_tools,
)| secrets = response.json() or [] | ||
| except Exception: # pylint: disable=broad-except | ||
| log.warning("agent: vault secrets fetch failed", exc_info=True) | ||
| return {} | ||
|
|
||
| env: Dict[str, str] = {} | ||
| for secret in secrets: | ||
| if not isinstance(secret, dict) or secret.get("kind") != "provider_key": | ||
| continue |
There was a problem hiding this comment.
Handle /secrets/ envelope payloads; current parsing can silently drop all secrets.
At Line 58, response.json() is treated as an iterable secret list. If the endpoint returns an object envelope (for example { "data": [...] }), the loop at Line 64 iterates string keys and no secret is ever mapped, causing silent auth fallback.
Proposed fix
- secrets = response.json() or []
+ payload = response.json() or []
except Exception: # pylint: disable=broad-except
log.warning("agent: vault secrets fetch failed", exc_info=True)
return {}
env: Dict[str, str] = {}
+ if isinstance(payload, dict):
+ secrets = payload.get("data") or payload.get("results") or []
+ elif isinstance(payload, list):
+ secrets = payload
+ else:
+ secrets = []
+
for secret in secrets:
if not isinstance(secret, dict) or secret.get("kind") != "provider_key":
continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| secrets = response.json() or [] | |
| except Exception: # pylint: disable=broad-except | |
| log.warning("agent: vault secrets fetch failed", exc_info=True) | |
| return {} | |
| env: Dict[str, str] = {} | |
| for secret in secrets: | |
| if not isinstance(secret, dict) or secret.get("kind") != "provider_key": | |
| continue | |
| payload = response.json() or [] | |
| except Exception: # pylint: disable=broad-except | |
| log.warning("agent: vault secrets fetch failed", exc_info=True) | |
| return {} | |
| env: Dict[str, str] = {} | |
| if isinstance(payload, dict): | |
| secrets = payload.get("data") or payload.get("results") or [] | |
| elif isinstance(payload, list): | |
| secrets = payload | |
| else: | |
| secrets = [] | |
| for secret in secrets: | |
| if not isinstance(secret, dict) or secret.get("kind") != "provider_key": | |
| continue |
| async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client: | ||
| response = await client.post( | ||
| f"{api_base}/tools/resolve", | ||
| json={"tools": refs}, | ||
| headers=headers, | ||
| ) | ||
|
|
||
| if response.status_code >= 400: | ||
| raise RuntimeError( | ||
| f"Tool resolution failed (HTTP {response.status_code}): {response.text[:500]}" | ||
| ) | ||
|
|
||
| data = response.json() | ||
| builtins = data.get("builtins") or [] |
There was a problem hiding this comment.
Normalize transport/JSON failures into the same resolver error path.
Line 94 and Line 105 can raise httpx/JSON decode exceptions that bypass your explicit RuntimeError handling for failed resolution, producing inconsistent invoke failures.
Proposed fix
- async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client:
- response = await client.post(
- f"{api_base}/tools/resolve",
- json={"tools": refs},
- headers=headers,
- )
+ try:
+ async with httpx.AsyncClient(timeout=TOOLS_TIMEOUT) as client:
+ response = await client.post(
+ f"{api_base}/tools/resolve",
+ json={"tools": refs},
+ headers=headers,
+ )
+ except httpx.HTTPError as exc:
+ raise RuntimeError(f"Tool resolution request failed: {exc}") from exc
@@
- data = response.json()
+ try:
+ data = response.json()
+ except ValueError as exc:
+ raise RuntimeError("Tool resolution failed: backend returned invalid JSON") from exc| if not usage or not usage.get("total"): | ||
| return |
There was a problem hiding this comment.
Don’t treat total=0 as absent usage.
Line 70 drops valid zero-token usage because 0 is falsy, so span usage attributes are never recorded for those runs.
Proposed fix
- if not usage or not usage.get("total"):
+ if not usage or "total" not in usage:
return📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not usage or not usage.get("total"): | |
| return | |
| if not usage or "total" not in usage: | |
| return |
| monkeypatch.setattr(app, "select_backend", lambda selection: backend) | ||
| monkeypatch.setattr( |
There was a problem hiding this comment.
Cross-harness test can pass without validating harness routing.
Because the stub at Line 34 ignores selection, the equality assertion at Line 71 stays green even if harness parsing regresses to a single default route. Capture and assert the selected harness values to lock the intended behavior.
Suggested tightening
`@pytest.fixture`
def patched(monkeypatch, fake_backend):
backend = fake_backend(result=AgentResult(output="echo", usage={"total": 15}))
recorded = {}
+ selected = []
@@
- monkeypatch.setattr(app, "select_backend", lambda selection: backend)
+ def _select_backend(selection):
+ selected.append(selection.harness)
+ return backend
+ monkeypatch.setattr(app, "select_backend", _select_backend)
@@
- return backend, recorded
+ return backend, recorded, selected
@@
async def test_invoke_body_is_identical_across_harnesses(patched):
@@
- pi = await _invoke("pi")
+ _, _, selected = patched
+ pi = await _invoke("pi")
agenta = await _invoke("agenta")
claude = await _invoke("claude")
assert pi == agenta == claude
+ assert selected == ["pi", "agenta", "claude"]Also applies to: 65-71
| function toolName(tool: unknown): string | undefined { | ||
| if (!tool || typeof tool !== "object") return undefined | ||
| const fn = (tool as Record<string, unknown>).function | ||
| if (!fn || typeof fn !== "object") return undefined | ||
| const name = (fn as Record<string, unknown>).name | ||
| return typeof name === "string" ? name : undefined | ||
| } |
There was a problem hiding this comment.
Support legacy tool object shapes when deriving tool identity.
toolName() currently ignores bare-string and {name: ...} entries, but those shapes are still valid in resolver normalization. That breaks selectedToolNames/remove-by-name behavior for existing configs (duplicates can be re-added and removal can miss items).
Proposed fix
function toolName(tool: unknown): string | undefined {
- if (!tool || typeof tool !== "object") return undefined
- const fn = (tool as Record<string, unknown>).function
- if (!fn || typeof fn !== "object") return undefined
- const name = (fn as Record<string, unknown>).name
- return typeof name === "string" ? name : undefined
+ if (typeof tool === "string") return tool
+ if (!tool || typeof tool !== "object") return undefined
+ const obj = tool as Record<string, unknown>
+ if (typeof obj.name === "string") return obj.name
+ const fn = obj.function
+ if (!fn || typeof fn !== "object") return undefined
+ const name = (fn as Record<string, unknown>).name
+ return typeof name === "string" ? name : undefined
}Also applies to: 104-105, 108-110
|
Superseded. Replacing the path-based stack with PRs sliced by functional area showing final code only, so reviewers don't comment on intermediate scaffolding that a later PR rewrites. See the new set. |
This PR is part of a stack. Review bottom-up.
Each PR's diff is only its own delta. Merge from the bottom. This PR's base is #4760 (merge that first).
Context
This PR pulls the agent runtime out of the service and into the SDK behind ports and adapters. It builds on
feat/agent-rivet-acp-wp8and lands slice #4 fromdocs/design/agent-workflows/pr-stack.md: it draws the durable line between the neutral agent definition, the harness-specific config, and the runtime infrastructure. Before this change, the service handler owned the engine. The 470-lineservices/oss/src/agent.pyparsed the request, picked a harness, and called the runner all in one place, and the TypeScript runner owned the per-harness mapping. That made it hard to add an engine or run an agent without the service.What this changes
The runtime now lives in
agenta.sdk.agentsas five ports.Backendis the engine,Environmentowns the sandbox policy over a backend,Sandboxis where a session's process tree runs,Sessionis one conversation, andHarnessmaps a neutralSessionConfiginto harness-specific config.InProcessPiBackendandRivetBackendimplementBackend.PiHarness,ClaudeHarness, andAgentaHarnessimplementHarness.The service stops owning the engine and only composes adapters. Before,
agent.pychose a harness and drove the runner inline. Nowservices/oss/src/agent/app.pyresolves tools and secrets, picks a backend withselect_backend, then runs one turn throughmake_harness(...).prompt(...). The handler holds no engine knowledge.The neutral config and the harness config are now separate types. The author edits one neutral
AgentConfig(instructions, model, tools) plus aRunSelection(harness, sandbox, permission policy). EachHarnessturns that into aPiAgentConfig,ClaudeAgentConfig, orAgentaAgentConfig. These differ in shape, not just identity: Pi carries built-in tool names and never gates tool use, while Claude has no built-ins, delivers tools over MCP, and carries a permission policy.The TypeScript runner moves into role folders (
engines/,tools/,tracing/,extensions/),runPi.tsbecomesengines/pi.ts, and the flat ten-filesrc/gets a rewritten README. A dedicatedAgentConfigControl.tsxelement edits the typed config in the playground and reuses the model selector, the tool picker, and the enum selects, so Composio and built-in tools are finally selectable on an agent.Key architectural decision to review
The SDK owns the runtime ports and the service only composes adapters. Read
sdks/python/agenta/sdk/agents/interfaces.py. ABackenddeclaressupported_harnessesand stays pure plumbing: it takes an already-harness-shaped config and launches it. AHarnessvalidates at construction that the environment's backend can drive it, and raisesUnsupportedHarnessErrorotherwise. The per-harness knowledge that used to sit in the TypeScript runner now lives inadapters/harnesses.py, on the Python side. The tradeoff: this is more indirection than a service-owned switch, but it is what lets a future standalone SDK run an agent with no service, and it puts the "which engine can drive which harness" rule in one typed place instead of scattered string checks.The backend is a deployment choice, the harness is editable config, and
select_backendstraddles them. Readservices/oss/src/agent/app.py.select_backendupgradespi/agentato the rivet backend when the harness or a non-local sandbox needs it, so a Claude harness or a Daytona sandbox never silently drops the choice. Scrutinize the seam: the in-process backend supports{PI, AGENTA}and the rivet backend supports{PI, CLAUDE}, soagentaplus a non-local sandbox has no backend and raises rather than running the wrong thing. That gap is intentional, and it is the line most worth a second look.How to review this PR
Review the 7 commits one at a time. The middle commits are pure renames and folder moves with no behavior change, so the diff is large but most of it is mechanical.
session-shaped harness/runtime port) first. It introduces the session-shaped ports and the shared wire contract. This is the conceptual core.dedicated agent-config playground element) for the typed config and the new control, and commit 5 (relay Pi tool calls through the runner on Daytona) for the one real behavior fix in the runner.move the agent runtime into the SDK) last. It relocates the runtime toagenta.sdk.agents, rewires the service onto the ports, deletesservices/oss/src/harness, and adds the SDK and golden tests. This is where the final shape lands.Then read the code in this order:
dtos.pyandinterfaces.pyfor the contracts,adapters/in_process.pyandadapters/rivet.pyfor the two backends,adapters/harnesses.pyfor the neutral-to-harness mapping,services/oss/src/agent/app.pyfor the composition, andAgentConfigControl.tsxfor the playground.The regression most likely to break:
select_backendrouting. An old revision that sends only the flat params, or anagentaharness paired with a non-local sandbox, must keep resolving to the same backend it did before.RunSelection.from_paramsand the fall-back to the old shape are what hold the existing revisions.Tests / notes
The PR adds SDK unit tests (
test_harness_adapters.py,test_environment_lifecycle.py,test_dtos_*), a wire-contract test, and golden/runrequest/result fixtures undersdks/python/oss/tests/pytest/unit/agents/. The commit messages record live verification acrosspi,rivet+pi+local,rivet+claude+local, andrivet+pi+daytona.LocalBackendis still a stub and raisesNotImplementedError;AgentaHarnessdoes not yet run on rivet or Daytona. Both are known and tracked inground-truth.md.