[integration] Agent workflows (big-agents) by mmabrouk · Pull Request #4791 · Agenta-AI/agenta

mmabrouk · 2026-06-22T11:58:09Z

Context

big-agents is the integration branch for the agent-workflows feature. Every agent PR targets big-agents (directly, or by stacking on one that does). The plan is to review and merge each sub-PR into big-agents, then merge big-agents into main as a single unit.

This PR is a draft tracker. It stays open until all the open sub-PRs below are merged into big-agents. The branch started from an empty commit, so the diff fills in as sub-PRs land.

Integrated PRs

Each box gets checked when that PR is merged into big-agents. Indented items stack on the item above them.

SDK and service

feat(sdk): agent runtime behind backend/harness ports #4771 feat(sdk): agent runtime behind backend/harness ports — merged
- feat(agent): agent workflow service and tool-resolution API #4772 feat(agent): agent workflow service and tool-resolution API — merged
  - fix(tools): support no-auth Composio toolkits + server-owned connection flags #4785 fix(tools): support no-auth Composio toolkits + server-owned connection flags — merged

Runner

feat(agent): runner engines, HTTP server, tracing, and docker image #4778 feat(agent): runner engines, HTTP server, tracing, and docker image — merged (carries the whole runner stack: wire protocol, engines, tool execution)
- feat(agent): runner wire contract and tool execution #4773 feat(agent): runner wire contract and tool execution — closed; its commits were folded into feat(agent): runner engines, HTTP server, tracing, and docker image #4778
- test(agent): vitest suite + CI for the agent runner; fix relay error bug #4784 test(agent): vitest suite + CI for the agent runner — closed; superseded by big-agents (the relay-bug fix, the CI job, and a superset of its tests already landed via feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786)

Frontend

feat(frontend): agent config playground controls #4775 feat(frontend): agent config playground controls
- feat(frontend): agent chat streaming slice + RAG example demo #4780 feat(frontend): agent chat streaming slice + RAG example demo

Hosting

chore(hosting): wire the agent runner sidecar into compose #4776 chore(hosting): wire the agent runner sidecar into compose — merged

Sandbox-agent deployment

chore(agent): make sandbox-agent runner first-class #4786 chore(agent): make sandbox-agent runner first-class — merged
chore(railway): add sandbox-agent preview deployment #4802 chore(railway): add sandbox-agent preview deployment
chore(kubernetes): deploy sandbox-agent sidecar #4803 chore(kubernetes): deploy sandbox-agent sidecar
ci(agent): build and test sandbox-agent images #4804 ci(agent): build and test sandbox-agent images

The three deployment PRs were originally opened against chore/sandbox-agent-core as #4787 / #4788 / #4789. After #4786 merged, they were re-pointed at big-agents, which closed the old numbers and reopened them as #4802 / #4803 / #4804.

Docs

docs(agent): agent-workflows design wiki, ground truth, and archived POCs #4779 docs(agent): agent-workflows design wiki, ground truth, and archived POCs — merged

Branch-only (no PR yet)

These design-doc branches are stacked on big-agents but have no PR. Open one if you want them reviewed separately, otherwise they fold in with the docs.

docs/agent-model-config-and-provider-auth
docs/agent-skills-config
docs/agent-code-tool-sandbox
docs/agent-harness-capabilities

Notes

Closed and not part of this integration:
- feat(agent): runner wire contract and tool execution #4773 — folded into feat(agent): runner engines, HTTP server, tracing, and docker image #4778.
- test(agent): vitest suite + CI for the agent runner; fix relay error bug #4784 — superseded by big-agents (feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786 already carry its tests, CI job, and relay-bug fix; its version.ts was stale ["pi","rivet"]).
- feat(agent): runner engines, server, and tracing #4774 — superseded by feat(agent): runner engines, HTTP server, tracing, and docker image #4778.
- docs(agent): agent-workflows design and ground truth #4777 — superseded by docs(agent): agent-workflows design wiki, ground truth, and archived POCs #4779.
- feat(agent): run the Agenta harness on the rivet/ACP backend with forced skills #4782 — rivet harness, abandoned.
- chore(railway): add sandbox-agent preview deployment #4787 / chore(kubernetes): deploy sandbox-agent sidecar #4788 / ci(agent): build and test sandbox-agent images #4789 — re-pointed onto big-agents as chore(railway): add sandbox-agent preview deployment #4802 / chore(kubernetes): deploy sandbox-agent sidecar #4803 / ci(agent): build and test sandbox-agent images #4804.

…es protocol

…POCs

…r image Python `code` tools failed with `spawn python3 ENOENT` because neither runner image installed python3 (code.ts spawns python3). Add it to both. Also rebuild the Pi extension bundle from the mounted src on dev container start: the dev image bakes the bundle and only mounts src, so an edited extension went stale and silently stopped registering custom tools on the Rivet path. Adds a regression test for the extension tool-registration contract. Found via the agent-workflows QA matrix (findings F-005, F-006). Claude-Session: https://claude.ai/code/session_01KsGSJQwsUdgWcNSEt2P2qD

Adds docs/design/agent-workflows/qa/: the autohealing QA recipe (README), the Gherkin scenario matrix with a live scoreboard, the findings log (F-001..F-010 in the open-issues style), a reusable /invoke driver with captured runs, and the regression-test research plus the replay-test skill draft. Produced by a live end-to-end QA pass across the harness x environment x capability matrix; it documents and motivates the runner fixes in the sibling PRs (#4776, #4778). Claude-Session: https://claude.ai/code/session_01KsGSJQwsUdgWcNSEt2P2qD

…dd QA reports

vercel · 2026-06-22T11:58:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 24, 2026 7:30pm

coderabbitai · 2026-06-22T11:58:20Z

Important

Review skipped

Too many files!

This PR contains 529 files, which is 379 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

Upgrade to a paid plan to raise the limit.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: fcb8d1e2-f14d-486d-a065-d15e451af075

📥 Commits

Reviewing files that changed from the base of the PR and between 2eed5d0 and 9cbcbfd.

⛔ Files ignored due to path filters (3)

docs/design/agent-workflows/archive/wp-1-pi-tracing/poc/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
services/agent/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
web/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (529)

.github/workflows/12-check-unit-tests.yml
.github/workflows/42-railway-build.yml
.github/workflows/43-railway-deploy.yml
.gitignore
api/entrypoints/routers.py
api/oss/src/apis/fastapi/tools/models.py
api/oss/src/apis/fastapi/tools/router.py
api/oss/src/apis/fastapi/vault/router.py
api/oss/src/apis/fastapi/workflows/exceptions.py
api/oss/src/apis/fastapi/workflows/router.py
api/oss/src/core/tools/dtos.py
api/oss/src/core/tools/exceptions.py
api/oss/src/core/tools/providers/composio/adapter.py
api/oss/src/core/tools/providers/composio/catalog.py
api/oss/src/core/tools/service.py
api/oss/src/core/workflows/dtos.py
api/oss/src/core/workflows/interfaces.py
api/oss/src/core/workflows/platform_catalog.py
api/oss/src/core/workflows/service.py
api/oss/src/core/workflows/types.py
api/oss/tests/pytest/unit/tools/__init__.py
api/oss/tests/pytest/unit/tools/test_agent_resolution.py
api/oss/tests/pytest/unit/tools/test_no_auth_connection.py
api/oss/tests/pytest/unit/workflows/test_flag_ownership.py
api/oss/tests/pytest/unit/workflows/test_platform_catalog.py
docs/design/agent-workflows/README.md
docs/design/agent-workflows/archive/README.md
docs/design/agent-workflows/archive/harness-port-redesign/README.md
docs/design/agent-workflows/archive/harness-port-redesign/implementation.md
docs/design/agent-workflows/archive/harness-port-redesign/plan.md
docs/design/agent-workflows/archive/harness-port-redesign/proposal.md
docs/design/agent-workflows/archive/harness-port-redesign/research.md
docs/design/agent-workflows/archive/harness-port-redesign/status.md
docs/design/agent-workflows/archive/old-rfcs/agent-protocol-rfc.md
docs/design/agent-workflows/archive/old-rfcs/streaming-and-sessions.md
docs/design/agent-workflows/archive/research/auth-secrets.md
docs/design/agent-workflows/archive/research/daytona-sandbox.md
docs/design/agent-workflows/archive/research/diskless-in-memory-config.md
docs/design/agent-workflows/archive/research/open-questions.md
docs/design/agent-workflows/archive/research/otel-instrumentation.md
docs/design/agent-workflows/archive/research/pi-interaction.md
docs/design/agent-workflows/archive/research/sandbox-sharing.md
docs/design/agent-workflows/archive/sdk-local-backend/status.md
docs/design/agent-workflows/archive/wp-1-pi-tracing/README.md
docs/design/agent-workflows/archive/wp-1-pi-tracing/integrating-the-tracing-extension.md
docs/design/agent-workflows/archive/wp-1-pi-tracing/poc/.env.example
docs/design/agent-workflows/archive/wp-1-pi-tracing/poc/README.md
docs/design/agent-workflows/archive/wp-1-pi-tracing/poc/agenta-otel.ts
docs/design/agent-workflows/archive/wp-1-pi-tracing/poc/package.json
docs/design/agent-workflows/archive/wp-1-pi-tracing/poc/run.ts
docs/design/agent-workflows/archive/wp-1-pi-tracing/tracing-in-the-agent-service.md
docs/design/agent-workflows/archive/wp-2-agent-service/README.md
docs/design/agent-workflows/archive/wp-2-agent-service/implementation-plan.md
docs/design/agent-workflows/archive/wp-2-agent-service/qa.md
docs/design/agent-workflows/archive/wp-3-daytona-sandbox/README.md
docs/design/agent-workflows/archive/wp-3-daytona-sandbox/poc/README.md
docs/design/agent-workflows/archive/wp-3-daytona-sandbox/poc/bench_coldstart.py
docs/design/agent-workflows/archive/wp-3-daytona-sandbox/poc/build_snapshot.py
docs/design/agent-workflows/archive/wp-3-daytona-sandbox/poc/cleanup.py
docs/design/agent-workflows/archive/wp-3-daytona-sandbox/poc/run_agent.py
docs/design/agent-workflows/archive/wp-4-multi-message-output/README.md
docs/design/agent-workflows/archive/wp-5-chat-vs-completion/README.md
docs/design/agent-workflows/archive/wp-6-workflow-type-and-template/README.md
docs/design/agent-workflows/archive/wp-7-tools/README.md
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/README.md
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/architecture.md
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/context.md
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/isolation-and-fork.md
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/plan.md
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/poc/build_rivet_snapshot.py
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/poc/commit_agent_config.py
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/poc/debug-events.ts
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/poc/dump-full.ts
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/poc/package.json
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/poc/spike.ts
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/research.md
docs/design/agent-workflows/archive/wp-8-rivet-acp-runtime/status.md
docs/design/agent-workflows/documentation/adapters/agenta.md
docs/design/agent-workflows/documentation/adapters/claude-code.md
docs/design/agent-workflows/documentation/adapters/pi.md
docs/design/agent-workflows/documentation/agent-configuration.md
docs/design/agent-workflows/documentation/agent-template.md
docs/design/agent-workflows/documentation/architecture.md
docs/design/agent-workflows/documentation/ground-truth.md
docs/design/agent-workflows/documentation/ports-and-adapters.md
docs/design/agent-workflows/documentation/protocol.md
docs/design/agent-workflows/documentation/running-the-agent.md
docs/design/agent-workflows/documentation/sessions.md
docs/design/agent-workflows/documentation/skills.md
docs/design/agent-workflows/documentation/tools.md
docs/design/agent-workflows/documentation/triggers.md
docs/design/agent-workflows/projects/capability-config/README.md
docs/design/agent-workflows/projects/capability-config/context.md
docs/design/agent-workflows/projects/capability-config/plan.md
docs/design/agent-workflows/projects/capability-config/proposal.md
docs/design/agent-workflows/projects/capability-config/research.md
docs/design/agent-workflows/projects/capability-config/status.md
docs/design/agent-workflows/projects/model-config/proposal.md
docs/design/agent-workflows/projects/model-config/research.md
docs/design/agent-workflows/projects/provider-model-auth/README.md
docs/design/agent-workflows/projects/provider-model-auth/build-notes.md
docs/design/agent-workflows/projects/provider-model-auth/context.md
docs/design/agent-workflows/projects/provider-model-auth/design.md
docs/design/agent-workflows/projects/provider-model-auth/explainer.md
docs/design/agent-workflows/projects/provider-model-auth/harness-provider-matrix.md
docs/design/agent-workflows/projects/provider-model-auth/plan.md
docs/design/agent-workflows/projects/provider-model-auth/research.md
docs/design/agent-workflows/projects/provider-model-auth/status.md
docs/design/agent-workflows/projects/qa/README.md
docs/design/agent-workflows/projects/qa/cleanup-plan.md
docs/design/agent-workflows/projects/qa/findings.md
docs/design/agent-workflows/projects/qa/implementation-plan.md
docs/design/agent-workflows/projects/qa/matrix.md
docs/design/agent-workflows/projects/qa/regression-skill-DRAFT.md
docs/design/agent-workflows/projects/qa/regression-testing-research.md
docs/design/agent-workflows/projects/qa/runs/E1__append_system_pi.json
docs/design/agent-workflows/projects/qa/runs/E1__builtin_bash_agenta.json
docs/design/agent-workflows/projects/qa/runs/E1__builtin_bash_pi.json
docs/design/agent-workflows/projects/qa/runs/E1__code_tool_agenta.json
docs/design/agent-workflows/projects/qa/runs/E1__code_tool_pi.json
docs/design/agent-workflows/projects/qa/runs/E1__smoke_chat_agenta.json
docs/design/agent-workflows/projects/qa/runs/E1__smoke_chat_pi.json
docs/design/agent-workflows/projects/qa/runs/E2__append_system_pi.json
docs/design/agent-workflows/projects/qa/runs/E2__builtin_bash_agenta.json
docs/design/agent-workflows/projects/qa/runs/E2__builtin_bash_pi.json
docs/design/agent-workflows/projects/qa/runs/E2__claude_code_tool.json
docs/design/agent-workflows/projects/qa/runs/E2__claude_smoke.json
docs/design/agent-workflows/projects/qa/runs/E2__code_tool_agenta.json
docs/design/agent-workflows/projects/qa/runs/E2__code_tool_pi.json
docs/design/agent-workflows/projects/qa/runs/E2__mcp_claude.json
docs/design/agent-workflows/projects/qa/runs/E2__smoke_chat_agenta.json
docs/design/agent-workflows/projects/qa/runs/E2__smoke_chat_pi.json
docs/design/agent-workflows/projects/qa/runs/E3__builtin_bash_pi.json
docs/design/agent-workflows/projects/qa/runs/E3__code_tool_agenta.json
docs/design/agent-workflows/projects/qa/runs/E3__code_tool_pi.json
docs/design/agent-workflows/projects/qa/runs/E3__smoke_chat_pi.json
docs/design/agent-workflows/projects/qa/scripts/mcp_qa_server.mjs
docs/design/agent-workflows/projects/qa/scripts/run_matrix.py
docs/design/agent-workflows/projects/research/opencode-architecture.md
docs/design/agent-workflows/projects/runner-interface/README.md
docs/design/agent-workflows/projects/sandbox-agent-refactor/sandbox-agent-refactor-plan.md
docs/design/agent-workflows/projects/sdk-local-tools/README.md
docs/design/agent-workflows/projects/sdk-local-tools/codebase-conventions.md
docs/design/agent-workflows/projects/sdk-local-tools/context.md
docs/design/agent-workflows/projects/sdk-local-tools/conventions-review.md
docs/design/agent-workflows/projects/sdk-local-tools/organization-proposal.md
docs/design/agent-workflows/projects/sdk-local-tools/plan.md
docs/design/agent-workflows/projects/sdk-local-tools/research.md
docs/design/agent-workflows/projects/sdk-local-tools/review/evidence/app-mcp-reassign.md
docs/design/agent-workflows/projects/sdk-local-tools/review/evidence/attach-orthogonal-mutation.md
docs/design/agent-workflows/projects/sdk-local-tools/review/evidence/description-default-inconsistency.md
docs/design/agent-workflows/projects/sdk-local-tools/review/evidence/gateway-no-logging.md
docs/design/agent-workflows/projects/sdk-local-tools/review/evidence/gateway-orthogonal-untested.md
docs/design/agent-workflows/projects/sdk-local-tools/review/evidence/handler-resolution-error.md
docs/design/agent-workflows/projects/sdk-local-tools/review/findings.md
docs/design/agent-workflows/projects/sdk-local-tools/review/metadata.json
docs/design/agent-workflows/projects/sdk-local-tools/review/plan.md
docs/design/agent-workflows/projects/sdk-local-tools/review/progress.md
docs/design/agent-workflows/projects/sdk-local-tools/review/questions.md
docs/design/agent-workflows/projects/sdk-local-tools/review/risks.md
docs/design/agent-workflows/projects/sdk-local-tools/review/scope.md
docs/design/agent-workflows/projects/sdk-local-tools/review/scorecard.md
docs/design/agent-workflows/projects/sdk-local-tools/review/summary.md
docs/design/agent-workflows/projects/sdk-local-tools/status.md
docs/design/agent-workflows/projects/sidecar-deployment-proposal/README.md
docs/design/agent-workflows/projects/sidecar-deployment-proposal/proposal.md
docs/design/agent-workflows/projects/sidecar-deployment-proposal/status.md
docs/design/agent-workflows/projects/skills-config/architecture.md
docs/design/agent-workflows/projects/skills-config/build-notes.md
docs/design/agent-workflows/projects/tool-resolution-layering/plan.md
docs/design/agent-workflows/projects/typescript-structure/README.md
docs/design/agent-workflows/projects/typescript-structure/context.md
docs/design/agent-workflows/projects/typescript-structure/plan.md
docs/design/agent-workflows/projects/typescript-structure/research.md
docs/design/agent-workflows/projects/typescript-structure/status.md
docs/design/agent-workflows/scratch/agent-coordination.md
docs/design/agent-workflows/scratch/branch-cleanup-report.md
docs/design/agent-workflows/scratch/branch-pr-cleanup-report.md
docs/design/agent-workflows/scratch/branch-pr-cleanup-status.md
docs/design/agent-workflows/scratch/capability-architecture.md
docs/design/agent-workflows/scratch/capability-map.md
docs/design/agent-workflows/scratch/dead-code-report.md
docs/design/agent-workflows/scratch/feature-matrix-test.md
docs/design/agent-workflows/scratch/flows-and-capabilities.md
docs/design/agent-workflows/scratch/implementation-review.md
docs/design/agent-workflows/scratch/meeting-alignment.md
docs/design/agent-workflows/scratch/notes-architecture.md
docs/design/agent-workflows/scratch/notes-config-runsh.md
docs/design/agent-workflows/scratch/notes-model-auth.md
docs/design/agent-workflows/scratch/notes-tools-mcp-capabilities.md
docs/design/agent-workflows/scratch/open-issues.md
docs/design/agent-workflows/scratch/pr-stack.md
docs/design/agent-workflows/scratch/status.md
docs/design/agent-workflows/trash/.gitkeep
docs/design/vault-named-secrets/README.md
docs/design/vault-named-secrets/context.md
docs/design/vault-named-secrets/plan.md
docs/design/vault-named-secrets/research.md
docs/design/vault-named-secrets/status.md
docs/docs/self-host/02-configuration.mdx
docs/docs/self-host/guides/04-deploy-on-railway.mdx
docs/docs/self-host/guides/07-deploy-the-agent-runner.mdx
docs/docs/self-host/guides/08-custom-agent-runner-images.mdx
docs/docs/self-host/guides/09-agent-daytona-sandboxes.mdx
docs/docs/self-host/infrastructure/01-architecture.mdx
examples/python/RAG_QA_chatbot/backend/agent_loop.py
examples/python/RAG_QA_chatbot/backend/contract_stream.py
examples/python/RAG_QA_chatbot/backend/main.py
examples/python/RAG_QA_chatbot/backend/rag.py
examples/python/RAG_QA_chatbot/env.example
examples/python/RAG_QA_chatbot/ingest/fix_urls.py
examples/python/RAG_QA_chatbot/ingest/loaders.py
examples/python/RAG_QA_chatbot/ingest/store.py
examples/python/RAG_QA_chatbot/run-agent-chat-slice.sh
hosting/docker-compose/ee/docker-compose.dev.yml
hosting/docker-compose/ee/docker-compose.gh.local.yml
hosting/docker-compose/ee/docker-compose.gh.yml
hosting/docker-compose/ee/env.ee.dev.example
hosting/docker-compose/ee/env.ee.gh.example
hosting/docker-compose/oss/docker-compose.dev.yml
hosting/docker-compose/oss/docker-compose.gh.local.yml
hosting/docker-compose/oss/docker-compose.gh.ssl.yml
hosting/docker-compose/oss/docker-compose.gh.yml
hosting/docker-compose/oss/env.oss.dev.example
hosting/docker-compose/oss/env.oss.gh.example
hosting/kubernetes/ee/values.ee.example.yaml
hosting/kubernetes/helm/templates/NOTES.txt
hosting/kubernetes/helm/templates/_helpers.tpl
hosting/kubernetes/helm/templates/sandbox-agent-deployment.yaml
hosting/kubernetes/helm/templates/sandbox-agent-service.yaml
hosting/kubernetes/helm/templates/secrets.yaml
hosting/kubernetes/helm/templates/services-deployment.yaml
hosting/kubernetes/helm/values.schema.json
hosting/kubernetes/helm/values.yaml
hosting/kubernetes/oss/values.oss.example.yaml
hosting/railway/oss/README.md
hosting/railway/oss/sandbox-agent/Dockerfile
hosting/railway/oss/scripts/bootstrap.sh
hosting/railway/oss/scripts/build-and-push-images.sh
hosting/railway/oss/scripts/configure.sh
hosting/railway/oss/scripts/deploy-from-images.sh
hosting/railway/oss/scripts/deploy-services.sh
hosting/railway/oss/scripts/preview-resolve-env.sh
sdks/python/agenta/__init__.py
sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/__init__.py
sdks/python/agenta/sdk/agents/adapters/_runner_config.py
sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
sdks/python/agenta/sdk/agents/adapters/claude_settings.py
sdks/python/agenta/sdk/agents/adapters/harnesses.py
sdks/python/agenta/sdk/agents/adapters/local.py
sdks/python/agenta/sdk/agents/adapters/sandbox_agent.py
sdks/python/agenta/sdk/agents/adapters/vercel/__init__.py
sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
sdks/python/agenta/sdk/agents/adapters/vercel/routing.py
sdks/python/agenta/sdk/agents/adapters/vercel/sse.py
sdks/python/agenta/sdk/agents/adapters/vercel/stream.py
sdks/python/agenta/sdk/agents/capabilities.py
sdks/python/agenta/sdk/agents/connections/__init__.py
sdks/python/agenta/sdk/agents/connections/errors.py
sdks/python/agenta/sdk/agents/connections/interfaces.py
sdks/python/agenta/sdk/agents/connections/models.py
sdks/python/agenta/sdk/agents/connections/resolver.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/errors.py
sdks/python/agenta/sdk/agents/interfaces.py
sdks/python/agenta/sdk/agents/mcp/__init__.py
sdks/python/agenta/sdk/agents/mcp/errors.py
sdks/python/agenta/sdk/agents/mcp/interfaces.py
sdks/python/agenta/sdk/agents/mcp/models.py
sdks/python/agenta/sdk/agents/mcp/parsing.py
sdks/python/agenta/sdk/agents/mcp/resolver.py
sdks/python/agenta/sdk/agents/mcp/wire.py
sdks/python/agenta/sdk/agents/platform/__init__.py
sdks/python/agenta/sdk/agents/platform/connection.py
sdks/python/agenta/sdk/agents/platform/connections.py
sdks/python/agenta/sdk/agents/platform/gateway.py
sdks/python/agenta/sdk/agents/platform/resolve.py
sdks/python/agenta/sdk/agents/platform/secrets.py
sdks/python/agenta/sdk/agents/skills/__init__.py
sdks/python/agenta/sdk/agents/skills/errors.py
sdks/python/agenta/sdk/agents/skills/models.py
sdks/python/agenta/sdk/agents/skills/parsing.py
sdks/python/agenta/sdk/agents/skills/wire.py
sdks/python/agenta/sdk/agents/streaming.py
sdks/python/agenta/sdk/agents/tools/__init__.py
sdks/python/agenta/sdk/agents/tools/compat.py
sdks/python/agenta/sdk/agents/tools/errors.py
sdks/python/agenta/sdk/agents/tools/interfaces.py
sdks/python/agenta/sdk/agents/tools/models.py
sdks/python/agenta/sdk/agents/tools/parsing.py
sdks/python/agenta/sdk/agents/tools/resolver.py
sdks/python/agenta/sdk/agents/utils/__init__.py
sdks/python/agenta/sdk/agents/utils/ts_runner.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/agenta/sdk/decorators/routing.py
sdks/python/agenta/sdk/engines/running/interfaces.py
sdks/python/agenta/sdk/engines/running/registry.py
sdks/python/agenta/sdk/engines/running/utils.py
sdks/python/agenta/sdk/middlewares/running/normalizer.py
sdks/python/agenta/sdk/middlewares/running/resolver.py
sdks/python/agenta/sdk/models/workflows.py
sdks/python/agenta/sdk/utils/types.py
sdks/python/agenta/tests/agents/test_streaming.py
sdks/python/oss/tests/pytest/acceptance/workflows/test_new_uri_handlers.py
sdks/python/oss/tests/pytest/integration/agents/__init__.py
sdks/python/oss/tests/pytest/integration/agents/_in_process_backend.py
sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py
sdks/python/oss/tests/pytest/unit/agents/__init__.py
sdks/python/oss/tests/pytest/unit/agents/adapters/__init__.py
sdks/python/oss/tests/pytest/unit/agents/adapters/test_claude_settings.py
sdks/python/oss/tests/pytest/unit/agents/conftest.py
sdks/python/oss/tests/pytest/unit/agents/connections/__init__.py
sdks/python/oss/tests/pytest/unit/agents/connections/test_capabilities.py
sdks/python/oss/tests/pytest/unit/agents/connections/test_dtos_model_ref.py
sdks/python/oss/tests/pytest/unit/agents/connections/test_models.py
sdks/python/oss/tests/pytest/unit/agents/connections/test_resolver.py
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
sdks/python/oss/tests/pytest/unit/agents/mcp/__init__.py
sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py
sdks/python/oss/tests/pytest/unit/agents/platform/__init__.py
sdks/python/oss/tests/pytest/unit/agents/platform/conftest.py
sdks/python/oss/tests/pytest/unit/agents/platform/test_connection.py
sdks/python/oss/tests/pytest/unit/agents/platform/test_connections_http.py
sdks/python/oss/tests/pytest/unit/agents/platform/test_gateway_http.py
sdks/python/oss/tests/pytest/unit/agents/platform/test_resolve.py
sdks/python/oss/tests/pytest/unit/agents/platform/test_secrets_http.py
sdks/python/oss/tests/pytest/unit/agents/skills/__init__.py
sdks/python/oss/tests/pytest/unit/agents/skills/test_models.py
sdks/python/oss/tests/pytest/unit/agents/skills/test_parsing.py
sdks/python/oss/tests/pytest/unit/agents/skills/test_skills_e2e.py
sdks/python/oss/tests/pytest/unit/agents/skills/test_wire.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
sdks/python/oss/tests/pytest/unit/agents/tools/__init__.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_models.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.py
sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
sdks/python/oss/tests/pytest/unit/test_skill_config_catalog.py
sdks/python/oss/tests/pytest/unit/test_skill_flags.py
sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py
sdks/python/oss/tests/pytest/utils/test_resolver_middleware.py
sdks/python/oss/tests/pytest/utils/test_routing.py
services/agent/.dockerignore
services/agent/AGENTS.md
services/agent/CLAUDE.md
services/agent/README.md
services/agent/config/AGENTS.md
services/agent/config/agent.json
services/agent/docker/Dockerfile
services/agent/docker/Dockerfile.dev
services/agent/docker/README.md
services/agent/package.json
services/agent/sandbox-images/daytona/README.md
services/agent/sandbox-images/daytona/build_snapshot.py
services/agent/scripts/build-extension.mjs
services/agent/skills/agenta-getting-started/SKILL.md
services/agent/src/cli.ts
services/agent/src/engines/pi.ts
services/agent/src/engines/sandbox_agent.ts
services/agent/src/engines/sandbox_agent/capabilities.ts
services/agent/src/engines/sandbox_agent/daemon.ts
services/agent/src/engines/sandbox_agent/daytona.ts
services/agent/src/engines/sandbox_agent/errors.ts
services/agent/src/engines/sandbox_agent/mcp.ts
services/agent/src/engines/sandbox_agent/model.ts
services/agent/src/engines/sandbox_agent/permissions.ts
services/agent/src/engines/sandbox_agent/pi-assets.ts
services/agent/src/engines/sandbox_agent/provider.ts
services/agent/src/engines/sandbox_agent/run-plan.ts
services/agent/src/engines/sandbox_agent/transcript.ts
services/agent/src/engines/sandbox_agent/usage.ts
services/agent/src/engines/sandbox_agent/workspace.ts
services/agent/src/engines/skills.ts
services/agent/src/entry.ts
services/agent/src/extensions/agenta.ts
services/agent/src/protocol.ts
services/agent/src/responder.ts
services/agent/src/server.ts
services/agent/src/tools/callback.ts
services/agent/src/tools/code.ts
services/agent/src/tools/dispatch.ts
services/agent/src/tools/mcp-bridge.ts
services/agent/src/tools/mcp-server.ts
services/agent/src/tools/public-spec.ts
services/agent/src/tools/relay.ts
services/agent/src/tracing/otel.ts
services/agent/src/version.ts
services/agent/tests/unit/cli.test.ts
services/agent/tests/unit/code-tool.test.ts
services/agent/tests/unit/continuation.test.ts
services/agent/tests/unit/extension-tools.test.ts
services/agent/tests/unit/mcp-servers.test.ts
services/agent/tests/unit/pi-capability-guard.test.ts
services/agent/tests/unit/pi-provider-env.test.ts
services/agent/tests/unit/responder.test.ts
services/agent/tests/unit/sandbox-agent-capabilities.test.ts
services/agent/tests/unit/sandbox-agent-daemon.test.ts
services/agent/tests/unit/sandbox-agent-daytona.test.ts
services/agent/tests/unit/sandbox-agent-errors.test.ts
services/agent/tests/unit/sandbox-agent-model.test.ts
services/agent/tests/unit/sandbox-agent-orchestration.test.ts
services/agent/tests/unit/sandbox-agent-permissions.test.ts
services/agent/tests/unit/sandbox-agent-pi-assets.test.ts
services/agent/tests/unit/sandbox-agent-provider.test.ts
services/agent/tests/unit/sandbox-agent-run-plan.test.ts
services/agent/tests/unit/sandbox-agent-usage.test.ts
services/agent/tests/unit/sandbox-agent-workspace.test.ts
services/agent/tests/unit/server.test.ts
services/agent/tests/unit/skills.test.ts
services/agent/tests/unit/stream-events.test.ts
services/agent/tests/unit/tool-bridge.test.ts
services/agent/tests/unit/tool-dispatch.test.ts
services/agent/tests/unit/tool-relay-permission.test.ts
services/agent/tests/unit/wire-contract.test.ts
services/agent/tests/utils/golden.ts
services/agent/tsconfig.json
services/agent/vitest.config.ts
services/entrypoints/main.py
services/oss/src/agent/__init__.py
services/oss/src/agent/app.py
services/oss/src/agent/config.py
services/oss/src/agent/schemas.py
services/oss/src/agent/secrets.py
services/oss/src/agent/tools/__init__.py
services/oss/src/agent/tools/gateway.py
services/oss/src/agent/tools/resolver.py
services/oss/src/agent/tools/secrets.py
services/oss/src/agent/tracing.py
services/oss/tests/pytest/integration/__init__.py
services/oss/tests/pytest/integration/agent/__init__.py
services/oss/tests/pytest/integration/agent/conftest.py
services/oss/tests/pytest/integration/agent/test_resolve_secrets_http.py
services/oss/tests/pytest/integration/agent/tools/__init__.py
services/oss/tests/pytest/integration/agent/tools/test_gateway_http.py
services/oss/tests/pytest/integration/agent/tools/test_secrets_http.py
services/oss/tests/pytest/unit/__init__.py
services/oss/tests/pytest/unit/agent/__init__.py
services/oss/tests/pytest/unit/agent/conftest.py
services/oss/tests/pytest/unit/agent/test_invoke_handler.py
services/oss/tests/pytest/unit/agent/test_secrets_mapping.py
services/oss/tests/pytest/unit/agent/test_select_backend.py
services/oss/tests/pytest/unit/agent/tools/__init__.py
services/oss/tests/pytest/unit/agent/tools/test_gateway_mapping.py
services/oss/tests/pytest/unit/agent/tools/test_resolution.py
web/ee/src/pages/w/[workspace_id]/p/[project_id]/apps/[app_id]/agent-chat/index.tsx
web/oss/package.json
web/oss/src/components/AgentChatSlice/AgentChatPanel.tsx
web/oss/src/components/AgentChatSlice/assets/agConfig.ts
web/oss/src/components/AgentChatSlice/assets/constants.ts
web/oss/src/components/AgentChatSlice/assets/files.ts
web/oss/src/components/AgentChatSlice/assets/loadSession.ts
web/oss/src/components/AgentChatSlice/assets/markdown.tsx
web/oss/src/components/AgentChatSlice/assets/rewind.ts
web/oss/src/components/AgentChatSlice/assets/toAgentaMessage.ts
web/oss/src/components/AgentChatSlice/assets/trace.ts
web/oss/src/components/AgentChatSlice/assets/transport.ts
web/oss/src/components/AgentChatSlice/components/AgentChatConversation.tsx
web/oss/src/components/AgentChatSlice/components/AgentMessage.tsx
web/oss/src/components/AgentChatSlice/components/SessionHistoryMenu.tsx
web/oss/src/components/AgentChatSlice/components/SessionTabLabel.tsx
web/oss/src/components/AgentChatSlice/components/ToolPart.tsx
web/oss/src/components/AgentChatSlice/index.tsx
web/oss/src/components/AgentChatSlice/state/sessions.ts
web/oss/src/components/Layout/Layout.tsx
web/oss/src/components/Playground/Playground.tsx
web/oss/src/components/SharedDrawers/SessionDrawer/assets/utils.ts
web/oss/src/components/SharedDrawers/SessionDrawer/components/SessionHeader/index.tsx
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx
web/oss/src/components/pages/app-management/components/CreateAppDropdown/index.tsx
web/oss/src/components/pages/app-management/modals/CreateAppTypeModal/index.tsx
web/oss/src/components/pages/observability/components/SessionsTable/assets/sessionCellStore.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/DurationCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/EndTimeCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/FirstInputCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/LastOutputCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/StartTimeCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/TotalCostCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/TotalLatencyCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/TotalUsageCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/components/Cells/TracesCountCell.tsx
web/oss/src/components/pages/observability/components/SessionsTable/index.tsx
web/oss/src/components/pages/prompts/assets/iconHelpers.tsx
web/oss/src/lib/helpers/dynamicEnv.ts
web/oss/src/pages/w/[workspace_id]/p/[project_id]/apps/[app_id]/agent-chat/index.tsx
web/oss/src/state/newObservability/atoms/queries.ts
web/oss/src/state/newObservability/selectors/tracing.ts
web/packages/agenta-entities/src/loadable/controller.ts
web/packages/agenta-entities/src/workflow/core/schema.ts
web/packages/agenta-entities/src/workflow/state/appUtils.ts
web/packages/agenta-entities/src/workflow/state/evaluatorUtils.ts
web/packages/agenta-entities/src/workflow/state/helpers.ts
web/packages/agenta-entities/src/workflow/state/molecule.ts
web/packages/agenta-entities/src/workflow/state/store.ts
web/packages/agenta-entities/tests/unit/derive-workflow-type-agent.test.ts
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/AgentConfigControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/ClaudePermissionsControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/McpServerItemControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SandboxPermissionControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SchemaPropertyRenderer.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/SkillConfigControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/ToolItemControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/connectionUtils.ts
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/index.ts
web/packages/agenta-entity-ui/tests/unit/connectionUtils.test.ts
web/packages/agenta-entity-ui/tests/unit/skillConfigControl.test.ts
web/packages/agenta-playground-ui/src/components/ExecutionHeader/index.tsx
web/packages/agenta-playground-ui/src/components/ExecutionItems/index.tsx
web/packages/agenta-playground-ui/src/context/PlaygroundUIContext.tsx
web/packages/agenta-playground/src/index.ts
web/packages/agenta-playground/src/state/controllers/executionController.ts
web/packages/agenta-playground/src/state/execution/agentRequest.ts
web/packages/agenta-playground/src/state/execution/generationSelectors.ts
web/packages/agenta-playground/src/state/execution/index.ts
web/packages/agenta-playground/src/state/execution/selectors.ts
web/packages/agenta-playground/src/state/index.ts
web/packages/agenta-playground/tests/unit/agentMode.test.ts
web/packages/agenta-playground/tests/unit/agentRequest.test.ts

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.76% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[integration] Agent workflows (big-agents)' directly describes the PR as an integration tracking branch for the agent-workflows feature, which matches the stated objective of organizing and merging multiple related feature PRs.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The description matches the integration-tracking nature of the changes and references the same agent-workflows components added in the diff.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch big-agents

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…t connection flags No-auth Composio toolkits (codeinterpreter, the composio meta-toolkit) could not be connected. The adapter always POSTs an auth config, which Composio rejects for a no-auth toolkit (Auth_Config_NoAuthApp), and resolve/execute required a connected-account id those toolkits do not have, so the whole no-auth path was unreachable. Detect a no-auth toolkit (every auth_config_details[].mode == NO_AUTH), skip the auth-config and connected-account creation, and persist a usable connection with no Composio account. Resolve and execute omit the account id for a no-auth connection (Composio runs those tools with no account). Connection validity is now server-owned: a client can no longer send flags.is_valid to mark a pending auth connection usable. Refresh on a no-auth connection is a no-op, not a not-found error. Verified: connect 500 to 200, resolve 200, /tools/call ran print(6*7) and returned 42. New test_no_auth_connection.py (11 tests); all 15 tools unit tests pass, ruff clean. Reviewed by a second agent and Codex; their one blocker (client-settable is_valid) is fixed here. Claude-Session: https://claude.ai/code/session_01KsGSJQwsUdgWcNSEt2P2qD

…error handling)

coderabbitai

Actionable comments posted: 10

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 76c33a7d-feff-4e5f-acc0-962498f74cfc

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and 2eed5d0.

📒 Files selected for processing (70)

sdks/python/agenta/__init__.py
sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/__init__.py
sdks/python/agenta/sdk/agents/adapters/_runner_config.py
sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
sdks/python/agenta/sdk/agents/adapters/harnesses.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/local.py
sdks/python/agenta/sdk/agents/adapters/sandbox_agent.py
sdks/python/agenta/sdk/agents/adapters/vercel/__init__.py
sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
sdks/python/agenta/sdk/agents/adapters/vercel/routing.py
sdks/python/agenta/sdk/agents/adapters/vercel/sse.py
sdks/python/agenta/sdk/agents/adapters/vercel/stream.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/errors.py
sdks/python/agenta/sdk/agents/interfaces.py
sdks/python/agenta/sdk/agents/mcp/__init__.py
sdks/python/agenta/sdk/agents/mcp/errors.py
sdks/python/agenta/sdk/agents/mcp/interfaces.py
sdks/python/agenta/sdk/agents/mcp/models.py
sdks/python/agenta/sdk/agents/mcp/parsing.py
sdks/python/agenta/sdk/agents/mcp/resolver.py
sdks/python/agenta/sdk/agents/mcp/wire.py
sdks/python/agenta/sdk/agents/streaming.py
sdks/python/agenta/sdk/agents/tools/__init__.py
sdks/python/agenta/sdk/agents/tools/compat.py
sdks/python/agenta/sdk/agents/tools/errors.py
sdks/python/agenta/sdk/agents/tools/interfaces.py
sdks/python/agenta/sdk/agents/tools/models.py
sdks/python/agenta/sdk/agents/tools/parsing.py
sdks/python/agenta/sdk/agents/tools/resolver.py
sdks/python/agenta/sdk/agents/tools/wire.py
sdks/python/agenta/sdk/agents/ui_messages.py
sdks/python/agenta/sdk/agents/utils/__init__.py
sdks/python/agenta/sdk/agents/utils/ts_runner.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/agenta/sdk/decorators/routing.py
sdks/python/agenta/sdk/engines/running/interfaces.py
sdks/python/agenta/sdk/engines/running/utils.py
sdks/python/agenta/sdk/middlewares/running/normalizer.py
sdks/python/agenta/sdk/models/workflows.py
sdks/python/agenta/sdk/utils/types.py
sdks/python/agenta/tests/agents/test_streaming.py
sdks/python/oss/tests/pytest/integration/agents/__init__.py
sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py
sdks/python/oss/tests/pytest/unit/agents/__init__.py
sdks/python/oss/tests/pytest/unit/agents/conftest.py
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
sdks/python/oss/tests/pytest/unit/agents/mcp/__init__.py
sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
sdks/python/oss/tests/pytest/unit/agents/tools/__init__.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_models.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.py
sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py
sdks/python/oss/tests/pytest/utils/test_routing.py

coderabbitai · 2026-06-22T13:25:59Z

+NOTE on packaging: the Node runner is NOT part of this Python wheel (``pip install agenta``
+stays pure Python; the wheel contains zero ``.ts``/``.js``). How a standalone Pi user obtains
+the runner -- an ``npx`` npm package, a local checkout, or a Docker sidecar over HTTP -- is an
+open distribution decision; see ``docs/design/agent-workflows/typescript-structure/``. Do NOT
+silently bundle a JS runner into the wheel.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align LocalBackend wording with the stated packaging contract.

Line 9-13 says the wheel must not bundle a JS runner, but Line 30 and the NotImplementedError messages still say “bundled JS”. This contradiction will confuse integrators.

Suggested wording fix

-class LocalBackend(Backend): - """Run Pi (bundled JS) or Claude (``claude-agent-sdk``) on this machine.""" +class LocalBackend(Backend): + """Run Pi (external Node runner) or Claude (``claude-agent-sdk``) on this machine.""" ... raise NotImplementedError( - "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, " + "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, " "Phase 4: Claude via claude-agent-sdk)." ) ... raise NotImplementedError( - "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, " + "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, " "Phase 4: Claude via claude-agent-sdk)." )

Also applies to: 30-38, 50-53

coderabbitai · 2026-06-22T13:25:59Z

+    def __init__(
+        self,
+        *,
+        sandbox: str = "local",
+        url: Optional[str] = None,
+        command: Optional[Sequence[str]] = None,
+        cwd: Optional[str] = None,
+        timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")),
+    ) -> None:
+        self._sandbox = sandbox
+        self._url = url


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate sandbox at construction time.

Line 129 currently accepts any string; invalid values get sent over the wire and fail late. Restrict this to supported values (local, daytona) and raise a configuration error early.

Suggested validation

from ..dtos import ( @@ ) +from ..errors import AgentRunnerConfigurationError @@ def __init__( self, *, sandbox: str = "local", @@ timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")), ) -> None: + allowed_sandboxes = {"local", "daytona"} + if sandbox not in allowed_sandboxes: + raise AgentRunnerConfigurationError( + f"Unsupported sandbox '{sandbox}'. Expected one of: {sorted(allowed_sandboxes)}." + ) self._sandbox = sandbox self._url = url

coderabbitai · 2026-06-22T13:25:59Z

+        llm_config = prompt_cfg.get("llm_config") or {}
+        model = llm_config.get("model") or defaults.model
+        instructions = _system_text(prompt_cfg.get("messages")) or defaults.instructions
+        raw_tools = llm_config.get("tools")
+        if raw_tools is None:
+            raw_tools = prompt_cfg.get("tools")
+    else:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard llm_config type before dictionary access.

Line 694 assumes prompt["llm_config"] is a dict. If it’s a non-dict value, this path crashes with AttributeError instead of applying defaults.

Proposed fix

prompt_cfg = params.get("prompt") if isinstance(prompt_cfg, dict): - llm_config = prompt_cfg.get("llm_config") or {} + raw_llm_config = prompt_cfg.get("llm_config") + llm_config = raw_llm_config if isinstance(raw_llm_config, dict) else {} model = llm_config.get("model") or defaults.model instructions = _system_text(prompt_cfg.get("messages")) or defaults.instructions raw_tools = llm_config.get("tools") if raw_tools is None: raw_tools = prompt_cfg.get("tools")

coderabbitai · 2026-06-22T13:25:59Z

+        sandbox = await self._sandbox()
+        if provisioning:
+            await sandbox.add_files(provisioning)
+        return await self._backend.create_session(
+            sandbox,
+            config,
+            harness=harness,
+            secrets=session_config.secrets,
+            trace=session_config.trace,
+            session_id=session_config.session_id,
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Destroy per-session sandbox on setup/session-creation failure.

If Line 224 (add_files) or Line 225 (create_session) raises, a per-session sandbox is left alive with no owner to tear it down.

Proposed fix

async def create_session( self, config: HarnessAgentConfig, *, harness: HarnessType, session_config: SessionConfig, provisioning: Optional[Mapping[str, bytes]] = None, ) -> Session: """Provision a sandbox per policy, then open a session in it.""" sandbox = await self._sandbox() - if provisioning: - await sandbox.add_files(provisioning) - return await self._backend.create_session( - sandbox, - config, - harness=harness, - secrets=session_config.secrets, - trace=session_config.trace, - session_id=session_config.session_id, - ) + try: + if provisioning: + await sandbox.add_files(provisioning) + return await self._backend.create_session( + sandbox, + config, + harness=harness, + secrets=session_config.secrets, + trace=session_config.trace, + session_id=session_config.session_id, + ) + except Exception: + if self._sandbox_per_session: + try: + await sandbox.destroy() + except Exception: + pass + raise

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sandbox = await self._sandbox()

if provisioning:

await sandbox.add_files(provisioning)

return await self._backend.create_session(

sandbox,

config,

harness=harness,

secrets=session_config.secrets,

trace=session_config.trace,

session_id=session_config.session_id,

)

sandbox = await self._sandbox()

try:

if provisioning:

await sandbox.add_files(provisioning)

return await self._backend.create_session(

sandbox,

config,

harness=harness,

secrets=session_config.secrets,

trace=session_config.trace,

session_id=session_config.session_id,

)

except Exception:

if self._sandbox_per_session:

try:

await sandbox.destroy()

except Exception:

pass

raise

coderabbitai · 2026-06-22T13:25:59Z

+        session = await self.create_session(config)
+
+        def _absorb(result: AgentResult) -> None:
+            if result.session_id:
+                config.session_id = result.session_id
+
+        return session.stream(messages).on_result(_absorb).on_cleanup(session.destroy)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ensure session cleanup if stream setup fails synchronously.

Line 321 only registers cleanup after session.stream(messages) succeeds. If stream construction raises, the session is leaked.

Proposed fix

session = await self.create_session(config) + try: + run = session.stream(messages) + except Exception: + await session.destroy() + raise def _absorb(result: AgentResult) -> None: if result.session_id: config.session_id = result.session_id - return session.stream(messages).on_result(_absorb).on_cleanup(session.destroy) + return run.on_result(_absorb).on_cleanup(session.destroy)

coderabbitai · 2026-06-22T13:25:59Z

+from agenta.sdk.agents.tools.models import MissingSecretPolicy
+
+from .errors import MissingMCPSecretError
+from .interfaces import MCPSecretProvider
+from .models import MCPServerConfig, ResolvedMCPServer
+
+
+class MCPResolver:
+    def __init__(
+        self,
+        *,
+        secret_provider: MCPSecretProvider,
+        missing_secret_policy: MissingSecretPolicy = MissingSecretPolicy.ERROR,
+    ) -> None:


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Breaks declared layer direction by importing tools model into MCP.

MCPResolver currently depends on agenta.sdk.agents.tools.models.MissingSecretPolicy, but this cohort declares tools as depending on MCP, not the other way around. This reverse edge can create import-order fragility and circular dependency risk as the stack evolves. Move MissingSecretPolicy to a neutral/shared module (or MCP/shared contract module) and import it from both subsystems.

Possible direction

- from agenta.sdk.agents.tools.models import MissingSecretPolicy + from agenta.sdk.agents.shared.missing_secret_policy import MissingSecretPolicy

(then define/move the enum in that shared module and update tools imports accordingly)

coderabbitai · 2026-06-22T13:25:59Z

+    out = stdout.decode("utf-8", "replace")
+    err = stderr.decode("utf-8", "replace")
+    if not out.strip():
+        raise RuntimeError(
+            f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}"
+        )
+    try:
+        return json.loads(out)
+    except json.JSONDecodeError as exc:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat non-zero subprocess exit as transport failure even with parseable JSON.

Line 74 returns parsed JSON without checking proc.returncode; a crashed runner can look successful if it emitted partial/legacy JSON before exiting non-zero.

Suggested fix

@@ async def deliver_subprocess(...): out = stdout.decode("utf-8", "replace") err = stderr.decode("utf-8", "replace") + if proc.returncode not in (0, None): + raise RuntimeError( + "Agent runner exited non-zero. " + f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}" + ) if not out.strip(): raise RuntimeError( f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

out = stdout.decode("utf-8", "replace")

err = stderr.decode("utf-8", "replace")

if not out.strip():

raise RuntimeError(

f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}"

)

try:

return json.loads(out)

except json.JSONDecodeError as exc:

out = stdout.decode("utf-8", "replace")

err = stderr.decode("utf-8", "replace")

if proc.returncode not in (0, None):

raise RuntimeError(

"Agent runner exited non-zero. "

f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}"

)

if not out.strip():

raise RuntimeError(

f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}"

)

try:

return json.loads(out)

except json.JSONDecodeError as exc:

coderabbitai · 2026-06-22T13:25:59Z

    # agenta:builtin:* — application-only (not evaluators)
    ("builtin", "chat"): (True, False, False),
    ("builtin", "completion"): (True, False, False),
+    ("builtin", "agent"): (True, False, False),


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

is_agent is never inferred, so agent workflows keep WorkflowFlags.is_agent=False.

You added the built-in agent role mapping, but infer_flags_from_data still never computes/passes is_agent into WorkflowFlags, so the new agent flag/filter path won’t work as intended.

💡 Proposed fix

@@ - is_chat = key == "chat" or _has_messages_input(inputs_schema) + is_chat = key == "chat" or _has_messages_input(inputs_schema) + is_agent = key == "agent" @@ return WorkflowFlags( @@ # schema-derived is_chat=is_chat, + is_agent=is_agent, # interface-derived has_url=has_url,

…ckage Move the Agenta-platform-backed tool and secret resolution out of the agent service into a new SDK package (agenta.sdk.agents.platform) so a standalone SDK user with a local backend resolves gateway tools and secrets the same way the service does. - New SDK package: PlatformConnection, AgentaGatewayToolResolver, AgentaNamedSecretProvider + resolve_named_secrets, resolve_provider_keys, and three entrypoints resolve_tools / resolve_mcp / resolve_secrets. - Service is now thin: client.py deleted (logic in PlatformConnection, timeout guarded); tools/{gateway,secrets}.py and secrets.py are re-export shims; resolver.py keeps only the AGENTA_AGENT_ENABLE_MCP gate; app.py calls the three entrypoints with symmetric helpers. - Behavior-preserving: /run wire + resolved bundle unchanged (golden test green). Secret logs count-only; named secrets restricted to the requested set. - Tests: SDK agents 164 + service agent unit 20; HTTP integration tests relocated to the SDK. Claude-Session: https://claude.ai/code/session_019gCmobHk9Pi3Y2HDTw3Wrs test(agent): add SDK platform conftest and gateway resolver test

…links fix(docs): remove broken custom-agent-runner-images links

ci(agent): build and test sandbox-agent images

chore(railway): add sandbox-agent preview deployment

chore(kubernetes): deploy sandbox-agent sidecar

ClaudeAgentConfig must override wire_skills to return {} since Claude's headless SDK cannot load inline skill packages. The override was lost in the main->big-agents merge, regressing test_invoke_cross_harness_same_body_divergent_configs. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

Two files failed ruff format --check on the integration branch. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

* fix(frontend): repair SessionsTable JSX from botched merge The main->big-agents merge left a duplicate ternary and a mismatched <div>/</SessionStoreProvider> wrapper, breaking prettier, eslint and the web build. Unite both sides: wrap in SessionStoreProvider, keep one table with store={store} and the flex-1 layout class. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT * fix(entities): add is_agent flag in ephemeral workflow build The merge added a defaulted is_agent flag to workflowFlagsSchema, but the agent-playground ephemeral workflow constructed its flags without it. With the literal true/false flag values, the 'as Workflow' cast then failed bidirectional overlap (TS2352), breaking the agenta-web build. Set is_agent from the workflow type. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT * fix(playground,entity-ui): clear package type errors from merge Two more tsc --noEmit failures broke the agenta-web build (turbo builds each package before the app): - agentRequest.ts: annotate headers as Record<string,string> so the conditional header-factory spread does not narrow away the index signature (Authorization access, TS2339). - AgentConfigControl.tsx: drop the stale 'default' key from CONNECTION_MODE_LABELS; ConnectionMode is agenta|self_managed only after the provider-model-auth refactor (TS2353). Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

…4825) My earlier #4824 added a ClaudeAgentConfig.wire_skills override returning {} (graceful-degrade), but that contradicts the authoritative behavior from 08212c6 (fix(agent): materialize skills for Claude harness): the runner materializes skills under .claude/skills/<name>, so Claude carries them on the wire. The override broke the SDK unit test test_claude_carries_skills_for_project_local_materialization. Remove the override (Claude inherits the base wire_skills) and update the stale services-test assertion to expect Claude carries the skill. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

github-actions · 2026-06-24T17:55:57Z

Railway Preview Environment


Preview URL	https://gateway-production-0f05.up.railway.app/w
Project	`agenta-oss-pr-4791`
Image tag	`pr-4791-237d7c6`
Status	Deployed
Railway logs	Open logs
Workflow logs	View workflow run
Updated at 2026-06-24T19:33:14.381Z

…4826) * test(agent): align acceptance/integration tests with refactors These suites were skipped while the web build was broken; once it passed they ran and surfaced pre-existing drift on big-agents: - sdk acceptance: the agent builtin now ships a registered interface (no in-process handler), so test_agent_alias_is_not_registered was stale. Renamed to assert the interface is registered and the handler is absent. - services integration: gateway/secret resolution moved into the SDK platform package (#4772), so the agent_api_base/request_authorization/ httpx/log module attributes the conftest patched no longer exist on the service shims. Patch the SDK platform connection derivation helpers and the SDK platform module httpx/log instead. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT * test(agent): patch SDK platform secrets module in resolve-secrets test Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

The store path strips the server-owned is_platform flag before persisting (_scrub_server_owned_flags), but the query path did not, so any /workflows/query carrying is_platform (e.g. a client re-posting a workflow's own echoed flags) built a JSONB containment filter for a key that is never stored, matching zero rows. Scrub server-owned flags on both the artifact and revision query builders, symmetric with the write path. Platform-catalogue workflows are served from the code catalog, not the DB, so is_platform must never gate a DB containment query. Fixes the skipped-then-surfaced acceptance test test_query_workflows_by_flags (count 0 -> 1). Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

) #4827 scrubbed the server-owned is_platform flag from query filters unconditionally, which broke test_query_with_explicit_is_platform_filters_on_it: an explicit is_platform=True is a deliberate platform-catalogue filter and must be preserved. Use a query-specific scrub that drops a server-owned flag only when its value is False (the echoed default that would otherwise match nothing, since the key is scrubbed on write). An explicit True is kept. The write path keeps the unconditional scrub. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

mmabrouk added 17 commits June 19, 2026 18:27

feat(agent): runner wire protocol and tool execution

2f467ce

feat(sdk): agent runtime ports, adapters, tool resolution, and messag…

b9e62f9

…es protocol

feat(agent): runner engines, HTTP server, tracing, and docker image

965e180

docs(agent): agent-workflows design wiki, ground truth, and archived …

1087fa2

…POCs

fix(sdk): validate agent runner configuration

741fc73

docs(agent): clarify active stack docs map

8131d20

fix(agent): keep tool bridge secrets runner-side

c0df8df

fix(agent): relay child tools and finalize runner usage

2c2bac7

chore(railway): add sandbox-agent preview deployment

d08efeb

ci(agent): build and test sandbox-agent images

08aaddd

refactor(sdk): rename rivet adapter/backend to sandbox-agent

2a7c129

refactor(agent): rename rivet engine/driver to sandbox-agent

3482402

test(agent): remove old test/ files relocated to tests/unit

8f6e48b

docs(agent): sync agent-workflows docs for sandbox-agent rename and a…

8b07fca

…dd QA reports

chore(agent): open big-agents integration branch for agent workflows

1a3f330

vercel Bot deployed to Preview June 22, 2026 11:58 View deployment

mmabrouk added 5 commits June 22, 2026 14:16

feat(agent): agent workflow service and gateway tool-resolution API

ec3a2c3

test(agent): cover missing runner assets

8ab32b0

refactor(agent): rename rivet backend to sandbox-agent in the service

739dca6

fix(sdk): address review feedback (locking, input validation, stream/…

0beb120

…error handling)

vercel Bot deployed to Preview June 22, 2026 12:21 View deployment

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

mmabrouk added 2 commits June 23, 2026 12:10

refactor(agent): split sandbox-agent runner orchestration

3ecd437

vercel Bot deployed to Preview June 23, 2026 10:42 View deployment

Merge pull request #4819 from Agenta-AI/fix/docs-broken-agent-runner-…

d09bae4

…links fix(docs): remove broken custom-agent-runner-images links

vercel Bot deployed to Preview June 24, 2026 16:53 View deployment

Merge pull request #4804 from Agenta-AI/ci/sandbox-agent-image

f517388

ci(agent): build and test sandbox-agent images

vercel Bot deployed to Preview June 24, 2026 17:03 View deployment

mmabrouk added 2 commits June 24, 2026 19:03

Merge pull request #4802 from Agenta-AI/chore/sandbox-agent-railway

5d9fd6c

chore(railway): add sandbox-agent preview deployment

Merge pull request #4803 from Agenta-AI/chore/sandbox-agent-kubernetes

9ab7196

chore(kubernetes): deploy sandbox-agent sidecar

vercel Bot deployed to Preview June 24, 2026 17:05 View deployment

mmabrouk marked this pull request as ready for review June 24, 2026 17:05

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. draft labels Jun 24, 2026

Merge branch 'main' into big-agents

0a3717d

vercel Bot deployed to Preview June 24, 2026 17:08 View deployment

This was referenced Jun 24, 2026

fix(frontend): repair SessionsTable JSX from botched merge #4822

Merged

style(examples): ruff format RAG_QA_chatbot scripts #4823

Merged

fix(agent): drop skills on Claude wire (restore lost override) #4824

Merged

mmabrouk added 2 commits June 24, 2026 19:26

style(examples): ruff format RAG_QA_chatbot scripts (#4823)

3f66565

Two files failed ruff format --check on the integration branch. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT

vercel Bot deployed to Preview June 24, 2026 17:27 View deployment

mmabrouk added 2 commits June 24, 2026 19:44

vercel Bot deployed to Preview June 24, 2026 17:46 View deployment

This was referenced Jun 24, 2026

test(agent): align acceptance/integration tests with agent refactors #4826

Merged

fix(api): scrub server-owned flags from workflow query filters #4827

Merged

vercel Bot deployed to Preview June 24, 2026 19:22 View deployment

vercel Bot deployed to Preview June 24, 2026 19:24 View deployment

vercel Bot deployed to Preview June 24, 2026 19:30 View deployment

Uh oh!

Conversation

mmabrouk commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Integrated PRs

Branch-only (no PR yet)

Notes

Uh oh!

vercel Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmabrouk commented Jun 22, 2026 •

edited

Loading

vercel Bot commented Jun 22, 2026 •

edited

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading