[integration] Agent workflows (big-agents)#4791
Conversation
…r image Python `code` tools failed with `spawn python3 ENOENT` because neither runner image installed python3 (code.ts spawns python3). Add it to both. Also rebuild the Pi extension bundle from the mounted src on dev container start: the dev image bakes the bundle and only mounts src, so an edited extension went stale and silently stopped registering custom tools on the Rivet path. Adds a regression test for the extension tool-registration contract. Found via the agent-workflows QA matrix (findings F-005, F-006). Claude-Session: https://claude.ai/code/session_01KsGSJQwsUdgWcNSEt2P2qD
Adds docs/design/agent-workflows/qa/: the autohealing QA recipe (README), the Gherkin scenario matrix with a live scoreboard, the findings log (F-001..F-010 in the open-issues style), a reusable /invoke driver with captured runs, and the regression-test research plus the replay-test skill draft. Produced by a live end-to-end QA pass across the harness x environment x capability matrix; it documents and motivates the runner fixes in the sibling PRs (#4776, #4778). Claude-Session: https://claude.ai/code/session_01KsGSJQwsUdgWcNSEt2P2qD
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedToo many files! This PR contains 529 files, which is 379 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (529)
You can disable this status message by setting the Use the checkbox below for a quick retry:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…t connection flags No-auth Composio toolkits (codeinterpreter, the composio meta-toolkit) could not be connected. The adapter always POSTs an auth config, which Composio rejects for a no-auth toolkit (Auth_Config_NoAuthApp), and resolve/execute required a connected-account id those toolkits do not have, so the whole no-auth path was unreachable. Detect a no-auth toolkit (every auth_config_details[].mode == NO_AUTH), skip the auth-config and connected-account creation, and persist a usable connection with no Composio account. Resolve and execute omit the account id for a no-auth connection (Composio runs those tools with no account). Connection validity is now server-owned: a client can no longer send flags.is_valid to mark a pending auth connection usable. Refresh on a no-auth connection is a no-op, not a not-found error. Verified: connect 500 to 200, resolve 200, /tools/call ran print(6*7) and returned 42. New test_no_auth_connection.py (11 tests); all 15 tools unit tests pass, ruff clean. Reviewed by a second agent and Codex; their one blocker (client-settable is_valid) is fixed here. Claude-Session: https://claude.ai/code/session_01KsGSJQwsUdgWcNSEt2P2qD
There was a problem hiding this comment.
Actionable comments posted: 10
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 76c33a7d-feff-4e5f-acc0-962498f74cfc
📒 Files selected for processing (70)
sdks/python/agenta/__init__.pysdks/python/agenta/sdk/agents/__init__.pysdks/python/agenta/sdk/agents/adapters/__init__.pysdks/python/agenta/sdk/agents/adapters/_runner_config.pysdks/python/agenta/sdk/agents/adapters/agenta_builtins.pysdks/python/agenta/sdk/agents/adapters/harnesses.pysdks/python/agenta/sdk/agents/adapters/in_process.pysdks/python/agenta/sdk/agents/adapters/local.pysdks/python/agenta/sdk/agents/adapters/sandbox_agent.pysdks/python/agenta/sdk/agents/adapters/vercel/__init__.pysdks/python/agenta/sdk/agents/adapters/vercel/messages.pysdks/python/agenta/sdk/agents/adapters/vercel/routing.pysdks/python/agenta/sdk/agents/adapters/vercel/sse.pysdks/python/agenta/sdk/agents/adapters/vercel/stream.pysdks/python/agenta/sdk/agents/dtos.pysdks/python/agenta/sdk/agents/errors.pysdks/python/agenta/sdk/agents/interfaces.pysdks/python/agenta/sdk/agents/mcp/__init__.pysdks/python/agenta/sdk/agents/mcp/errors.pysdks/python/agenta/sdk/agents/mcp/interfaces.pysdks/python/agenta/sdk/agents/mcp/models.pysdks/python/agenta/sdk/agents/mcp/parsing.pysdks/python/agenta/sdk/agents/mcp/resolver.pysdks/python/agenta/sdk/agents/mcp/wire.pysdks/python/agenta/sdk/agents/streaming.pysdks/python/agenta/sdk/agents/tools/__init__.pysdks/python/agenta/sdk/agents/tools/compat.pysdks/python/agenta/sdk/agents/tools/errors.pysdks/python/agenta/sdk/agents/tools/interfaces.pysdks/python/agenta/sdk/agents/tools/models.pysdks/python/agenta/sdk/agents/tools/parsing.pysdks/python/agenta/sdk/agents/tools/resolver.pysdks/python/agenta/sdk/agents/tools/wire.pysdks/python/agenta/sdk/agents/ui_messages.pysdks/python/agenta/sdk/agents/utils/__init__.pysdks/python/agenta/sdk/agents/utils/ts_runner.pysdks/python/agenta/sdk/agents/utils/wire.pysdks/python/agenta/sdk/decorators/routing.pysdks/python/agenta/sdk/engines/running/interfaces.pysdks/python/agenta/sdk/engines/running/utils.pysdks/python/agenta/sdk/middlewares/running/normalizer.pysdks/python/agenta/sdk/models/workflows.pysdks/python/agenta/sdk/utils/types.pysdks/python/agenta/tests/agents/test_streaming.pysdks/python/oss/tests/pytest/integration/agents/__init__.pysdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.pysdks/python/oss/tests/pytest/unit/agents/__init__.pysdks/python/oss/tests/pytest/unit/agents/conftest.pysdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.jsonsdks/python/oss/tests/pytest/unit/agents/mcp/__init__.pysdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.pysdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.pysdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.pysdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.pysdks/python/oss/tests/pytest/unit/agents/test_ui_messages.pysdks/python/oss/tests/pytest/unit/agents/test_wire_contract.pysdks/python/oss/tests/pytest/unit/agents/tools/__init__.pysdks/python/oss/tests/pytest/unit/agents/tools/test_models.pysdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.pysdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.pysdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.pysdks/python/oss/tests/pytest/utils/test_messages_endpoint.pysdks/python/oss/tests/pytest/utils/test_routing.py
| NOTE on packaging: the Node runner is NOT part of this Python wheel (``pip install agenta`` | ||
| stays pure Python; the wheel contains zero ``.ts``/``.js``). How a standalone Pi user obtains | ||
| the runner -- an ``npx`` npm package, a local checkout, or a Docker sidecar over HTTP -- is an | ||
| open distribution decision; see ``docs/design/agent-workflows/typescript-structure/``. Do NOT | ||
| silently bundle a JS runner into the wheel. |
There was a problem hiding this comment.
Align LocalBackend wording with the stated packaging contract.
Line 9-13 says the wheel must not bundle a JS runner, but Line 30 and the NotImplementedError messages still say “bundled JS”. This contradiction will confuse integrators.
Suggested wording fix
-class LocalBackend(Backend):
- """Run Pi (bundled JS) or Claude (``claude-agent-sdk``) on this machine."""
+class LocalBackend(Backend):
+ """Run Pi (external Node runner) or Claude (``claude-agent-sdk``) on this machine."""
...
raise NotImplementedError(
- "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+ "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, "
"Phase 4: Claude via claude-agent-sdk)."
)
...
raise NotImplementedError(
- "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+ "LocalBackend is not implemented yet (Phase 3: Pi via external Node runner, "
"Phase 4: Claude via claude-agent-sdk)."
)Also applies to: 30-38, 50-53
| def __init__( | ||
| self, | ||
| *, | ||
| sandbox: str = "local", | ||
| url: Optional[str] = None, | ||
| command: Optional[Sequence[str]] = None, | ||
| cwd: Optional[str] = None, | ||
| timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")), | ||
| ) -> None: | ||
| self._sandbox = sandbox | ||
| self._url = url |
There was a problem hiding this comment.
Validate sandbox at construction time.
Line 129 currently accepts any string; invalid values get sent over the wire and fail late. Restrict this to supported values (local, daytona) and raise a configuration error early.
Suggested validation
from ..dtos import (
@@
)
+from ..errors import AgentRunnerConfigurationError
@@
def __init__(
self,
*,
sandbox: str = "local",
@@
timeout: float = float(os.getenv("AGENTA_AGENT_RUNNER_TIMEOUT_SECONDS", "180")),
) -> None:
+ allowed_sandboxes = {"local", "daytona"}
+ if sandbox not in allowed_sandboxes:
+ raise AgentRunnerConfigurationError(
+ f"Unsupported sandbox '{sandbox}'. Expected one of: {sorted(allowed_sandboxes)}."
+ )
self._sandbox = sandbox
self._url = url| llm_config = prompt_cfg.get("llm_config") or {} | ||
| model = llm_config.get("model") or defaults.model | ||
| instructions = _system_text(prompt_cfg.get("messages")) or defaults.instructions | ||
| raw_tools = llm_config.get("tools") | ||
| if raw_tools is None: | ||
| raw_tools = prompt_cfg.get("tools") | ||
| else: |
There was a problem hiding this comment.
Guard llm_config type before dictionary access.
Line 694 assumes prompt["llm_config"] is a dict. If it’s a non-dict value, this path crashes with AttributeError instead of applying defaults.
Proposed fix
prompt_cfg = params.get("prompt")
if isinstance(prompt_cfg, dict):
- llm_config = prompt_cfg.get("llm_config") or {}
+ raw_llm_config = prompt_cfg.get("llm_config")
+ llm_config = raw_llm_config if isinstance(raw_llm_config, dict) else {}
model = llm_config.get("model") or defaults.model
instructions = _system_text(prompt_cfg.get("messages")) or defaults.instructions
raw_tools = llm_config.get("tools")
if raw_tools is None:
raw_tools = prompt_cfg.get("tools")| sandbox = await self._sandbox() | ||
| if provisioning: | ||
| await sandbox.add_files(provisioning) | ||
| return await self._backend.create_session( | ||
| sandbox, | ||
| config, | ||
| harness=harness, | ||
| secrets=session_config.secrets, | ||
| trace=session_config.trace, | ||
| session_id=session_config.session_id, | ||
| ) |
There was a problem hiding this comment.
Destroy per-session sandbox on setup/session-creation failure.
If Line 224 (add_files) or Line 225 (create_session) raises, a per-session sandbox is left alive with no owner to tear it down.
Proposed fix
async def create_session(
self,
config: HarnessAgentConfig,
*,
harness: HarnessType,
session_config: SessionConfig,
provisioning: Optional[Mapping[str, bytes]] = None,
) -> Session:
"""Provision a sandbox per policy, then open a session in it."""
sandbox = await self._sandbox()
- if provisioning:
- await sandbox.add_files(provisioning)
- return await self._backend.create_session(
- sandbox,
- config,
- harness=harness,
- secrets=session_config.secrets,
- trace=session_config.trace,
- session_id=session_config.session_id,
- )
+ try:
+ if provisioning:
+ await sandbox.add_files(provisioning)
+ return await self._backend.create_session(
+ sandbox,
+ config,
+ harness=harness,
+ secrets=session_config.secrets,
+ trace=session_config.trace,
+ session_id=session_config.session_id,
+ )
+ except Exception:
+ if self._sandbox_per_session:
+ try:
+ await sandbox.destroy()
+ except Exception:
+ pass
+ raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sandbox = await self._sandbox() | |
| if provisioning: | |
| await sandbox.add_files(provisioning) | |
| return await self._backend.create_session( | |
| sandbox, | |
| config, | |
| harness=harness, | |
| secrets=session_config.secrets, | |
| trace=session_config.trace, | |
| session_id=session_config.session_id, | |
| ) | |
| sandbox = await self._sandbox() | |
| try: | |
| if provisioning: | |
| await sandbox.add_files(provisioning) | |
| return await self._backend.create_session( | |
| sandbox, | |
| config, | |
| harness=harness, | |
| secrets=session_config.secrets, | |
| trace=session_config.trace, | |
| session_id=session_config.session_id, | |
| ) | |
| except Exception: | |
| if self._sandbox_per_session: | |
| try: | |
| await sandbox.destroy() | |
| except Exception: | |
| pass | |
| raise |
| session = await self.create_session(config) | ||
|
|
||
| def _absorb(result: AgentResult) -> None: | ||
| if result.session_id: | ||
| config.session_id = result.session_id | ||
|
|
||
| return session.stream(messages).on_result(_absorb).on_cleanup(session.destroy) |
There was a problem hiding this comment.
Ensure session cleanup if stream setup fails synchronously.
Line 321 only registers cleanup after session.stream(messages) succeeds. If stream construction raises, the session is leaked.
Proposed fix
session = await self.create_session(config)
+ try:
+ run = session.stream(messages)
+ except Exception:
+ await session.destroy()
+ raise
def _absorb(result: AgentResult) -> None:
if result.session_id:
config.session_id = result.session_id
- return session.stream(messages).on_result(_absorb).on_cleanup(session.destroy)
+ return run.on_result(_absorb).on_cleanup(session.destroy)| from agenta.sdk.agents.tools.models import MissingSecretPolicy | ||
|
|
||
| from .errors import MissingMCPSecretError | ||
| from .interfaces import MCPSecretProvider | ||
| from .models import MCPServerConfig, ResolvedMCPServer | ||
|
|
||
|
|
||
| class MCPResolver: | ||
| def __init__( | ||
| self, | ||
| *, | ||
| secret_provider: MCPSecretProvider, | ||
| missing_secret_policy: MissingSecretPolicy = MissingSecretPolicy.ERROR, | ||
| ) -> None: |
There was a problem hiding this comment.
Breaks declared layer direction by importing tools model into MCP.
MCPResolver currently depends on agenta.sdk.agents.tools.models.MissingSecretPolicy, but this cohort declares tools as depending on MCP, not the other way around. This reverse edge can create import-order fragility and circular dependency risk as the stack evolves. Move MissingSecretPolicy to a neutral/shared module (or MCP/shared contract module) and import it from both subsystems.
Possible direction
- from agenta.sdk.agents.tools.models import MissingSecretPolicy
+ from agenta.sdk.agents.shared.missing_secret_policy import MissingSecretPolicy(then define/move the enum in that shared module and update tools imports accordingly)
| out = stdout.decode("utf-8", "replace") | ||
| err = stderr.decode("utf-8", "replace") | ||
| if not out.strip(): | ||
| raise RuntimeError( | ||
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | ||
| ) | ||
| try: | ||
| return json.loads(out) | ||
| except json.JSONDecodeError as exc: |
There was a problem hiding this comment.
Treat non-zero subprocess exit as transport failure even with parseable JSON.
Line 74 returns parsed JSON without checking proc.returncode; a crashed runner can look successful if it emitted partial/legacy JSON before exiting non-zero.
Suggested fix
@@ async def deliver_subprocess(...):
out = stdout.decode("utf-8", "replace")
err = stderr.decode("utf-8", "replace")
+ if proc.returncode not in (0, None):
+ raise RuntimeError(
+ "Agent runner exited non-zero. "
+ f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}"
+ )
if not out.strip():
raise RuntimeError(
f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| out = stdout.decode("utf-8", "replace") | |
| err = stderr.decode("utf-8", "replace") | |
| if not out.strip(): | |
| raise RuntimeError( | |
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | |
| ) | |
| try: | |
| return json.loads(out) | |
| except json.JSONDecodeError as exc: | |
| out = stdout.decode("utf-8", "replace") | |
| err = stderr.decode("utf-8", "replace") | |
| if proc.returncode not in (0, None): | |
| raise RuntimeError( | |
| "Agent runner exited non-zero. " | |
| f"exit={proc.returncode} stderr={err[-2000:]} stdout={out[:500]}" | |
| ) | |
| if not out.strip(): | |
| raise RuntimeError( | |
| f"Agent runner returned no output. exit={proc.returncode} stderr={err[-2000:]}" | |
| ) | |
| try: | |
| return json.loads(out) | |
| except json.JSONDecodeError as exc: |
| # agenta:builtin:* — application-only (not evaluators) | ||
| ("builtin", "chat"): (True, False, False), | ||
| ("builtin", "completion"): (True, False, False), | ||
| ("builtin", "agent"): (True, False, False), |
There was a problem hiding this comment.
is_agent is never inferred, so agent workflows keep WorkflowFlags.is_agent=False.
You added the built-in agent role mapping, but infer_flags_from_data still never computes/passes is_agent into WorkflowFlags, so the new agent flag/filter path won’t work as intended.
💡 Proposed fix
@@
- is_chat = key == "chat" or _has_messages_input(inputs_schema)
+ is_chat = key == "chat" or _has_messages_input(inputs_schema)
+ is_agent = key == "agent"
@@
return WorkflowFlags(
@@
# schema-derived
is_chat=is_chat,
+ is_agent=is_agent,
# interface-derived
has_url=has_url,…ckage
Move the Agenta-platform-backed tool and secret resolution out of the agent service
into a new SDK package (agenta.sdk.agents.platform) so a standalone SDK user with a
local backend resolves gateway tools and secrets the same way the service does.
- New SDK package: PlatformConnection, AgentaGatewayToolResolver, AgentaNamedSecretProvider
+ resolve_named_secrets, resolve_provider_keys, and three entrypoints resolve_tools /
resolve_mcp / resolve_secrets.
- Service is now thin: client.py deleted (logic in PlatformConnection, timeout guarded);
tools/{gateway,secrets}.py and secrets.py are re-export shims; resolver.py keeps only the
AGENTA_AGENT_ENABLE_MCP gate; app.py calls the three entrypoints with symmetric helpers.
- Behavior-preserving: /run wire + resolved bundle unchanged (golden test green). Secret
logs count-only; named secrets restricted to the requested set.
- Tests: SDK agents 164 + service agent unit 20; HTTP integration tests relocated to the SDK.
Claude-Session: https://claude.ai/code/session_019gCmobHk9Pi3Y2HDTw3Wrs
test(agent): add SDK platform conftest and gateway resolver test
…links fix(docs): remove broken custom-agent-runner-images links
ci(agent): build and test sandbox-agent images
chore(railway): add sandbox-agent preview deployment
chore(kubernetes): deploy sandbox-agent sidecar
ClaudeAgentConfig must override wire_skills to return {} since Claude's
headless SDK cannot load inline skill packages. The override was lost in
the main->big-agents merge, regressing
test_invoke_cross_harness_same_body_divergent_configs.
Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
Two files failed ruff format --check on the integration branch. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
* fix(frontend): repair SessionsTable JSX from botched merge
The main->big-agents merge left a duplicate ternary and a mismatched
<div>/</SessionStoreProvider> wrapper, breaking prettier, eslint and the
web build. Unite both sides: wrap in SessionStoreProvider, keep one
table with store={store} and the flex-1 layout class.
Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
* fix(entities): add is_agent flag in ephemeral workflow build
The merge added a defaulted is_agent flag to workflowFlagsSchema, but the
agent-playground ephemeral workflow constructed its flags without it. With
the literal true/false flag values, the 'as Workflow' cast then failed
bidirectional overlap (TS2352), breaking the agenta-web build. Set
is_agent from the workflow type.
Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
* fix(playground,entity-ui): clear package type errors from merge
Two more tsc --noEmit failures broke the agenta-web build (turbo builds
each package before the app):
- agentRequest.ts: annotate headers as Record<string,string> so the
conditional header-factory spread does not narrow away the index
signature (Authorization access, TS2339).
- AgentConfigControl.tsx: drop the stale 'default' key from
CONNECTION_MODE_LABELS; ConnectionMode is agenta|self_managed only
after the provider-model-auth refactor (TS2353).
Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
…4825) My earlier #4824 added a ClaudeAgentConfig.wire_skills override returning {} (graceful-degrade), but that contradicts the authoritative behavior from 08212c6 (fix(agent): materialize skills for Claude harness): the runner materializes skills under .claude/skills/<name>, so Claude carries them on the wire. The override broke the SDK unit test test_claude_carries_skills_for_project_local_materialization. Remove the override (Claude inherits the base wire_skills) and update the stale services-test assertion to expect Claude carries the skill. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
Railway Preview Environment
|
…4826) * test(agent): align acceptance/integration tests with refactors These suites were skipped while the web build was broken; once it passed they ran and surfaced pre-existing drift on big-agents: - sdk acceptance: the agent builtin now ships a registered interface (no in-process handler), so test_agent_alias_is_not_registered was stale. Renamed to assert the interface is registered and the handler is absent. - services integration: gateway/secret resolution moved into the SDK platform package (#4772), so the agent_api_base/request_authorization/ httpx/log module attributes the conftest patched no longer exist on the service shims. Patch the SDK platform connection derivation helpers and the SDK platform module httpx/log instead. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT * test(agent): patch SDK platform secrets module in resolve-secrets test Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
The store path strips the server-owned is_platform flag before persisting (_scrub_server_owned_flags), but the query path did not, so any /workflows/query carrying is_platform (e.g. a client re-posting a workflow's own echoed flags) built a JSONB containment filter for a key that is never stored, matching zero rows. Scrub server-owned flags on both the artifact and revision query builders, symmetric with the write path. Platform-catalogue workflows are served from the code catalog, not the DB, so is_platform must never gate a DB containment query. Fixes the skipped-then-surfaced acceptance test test_query_workflows_by_flags (count 0 -> 1). Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
) #4827 scrubbed the server-owned is_platform flag from query filters unconditionally, which broke test_query_with_explicit_is_platform_filters_on_it: an explicit is_platform=True is a deliberate platform-catalogue filter and must be preserved. Use a query-specific scrub that drops a server-owned flag only when its value is False (the echoed default that would otherwise match nothing, since the key is scrubbed on write). An explicit True is kept. The write path keeps the unconditional scrub. Claude-Session: https://claude.ai/code/session_01DEZYALzKjh9ocjkscaBWRT
Context
big-agentsis the integration branch for the agent-workflows feature. Every agent PR targetsbig-agents(directly, or by stacking on one that does). The plan is to review and merge each sub-PR intobig-agents, then mergebig-agentsintomainas a single unit.This PR is a draft tracker. It stays open until all the open sub-PRs below are merged into
big-agents. The branch started from an empty commit, so the diff fills in as sub-PRs land.Integrated PRs
Each box gets checked when that PR is merged into
big-agents. Indented items stack on the item above them.SDK and service
Runner
big-agents(the relay-bug fix, the CI job, and a superset of its tests already landed via feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786)Frontend
Hosting
Sandbox-agent deployment
Docs
Branch-only (no PR yet)
These design-doc branches are stacked on
big-agentsbut have no PR. Open one if you want them reviewed separately, otherwise they fold in with the docs.docs/agent-model-config-and-provider-authdocs/agent-skills-configdocs/agent-code-tool-sandboxdocs/agent-harness-capabilitiesNotes
big-agents(feat(agent): runner engines, HTTP server, tracing, and docker image #4778 + chore(agent): make sandbox-agent runner first-class #4786 already carry its tests, CI job, and relay-bug fix; itsversion.tswas stale["pi","rivet"]).big-agentsas chore(railway): add sandbox-agent preview deployment #4802 / chore(kubernetes): deploy sandbox-agent sidecar #4803 / ci(agent): build and test sandbox-agent images #4804.