Skip to content

Feature Feedback: Running 4 Agent SDK Agents in Production — What We'd Love to See #653

@jmm2020

Description

@jmm2020

Who We Are

We're running UCIS (Unified Consciousness Integration System) — a production multi-agent system with 4 Claude Agent SDK agents (Doctor, Lal, Lore, Quark) operating as Docker microservices. Each agent has a specialized role (infrastructure, ML research, strategy, finance) and they collaborate through an orchestration hub for daily automated "Discovery Patrol" sessions: parallel web research, cross-scoring, structured debate, convergence voting, and CFO financial evaluation.

We're on SDK v0.1.48, running ClaudeSDKClient with OAuth, bypassPermissions mode, custom MCP servers, and hooks. This is not a toy — these agents run daily, autonomously, producing scored and stored discoveries.

What's Working Well

  • ClaudeAgentOptions is clean and flexible — system prompts, MCP servers, hooks, permission control all work great
  • The hook system (PreToolUse, PostToolUse, Stop) is excellent for wiring agents into our event pipeline
  • can_use_tool callback gives us fine-grained control over what agents can do
  • The 0.1.46 additions (list_sessions, add_mcp_server, typed task messages) show the SDK is heading in the right direction
  • The bundled CLI updates keep us current without manual intervention

What Would Unlock the Next Level

1. Prompt Caching Control

Our agents send large, repeated context on every call: system prompts (~2K tokens), tool definitions, memory context blocks. The Anthropic Messages API supports cache_control on content blocks, but the SDK doesn't expose this.

Ask: Allow marking system prompt blocks or specific message content as cacheable, so repeated context across calls within a session doesn't get re-processed. Even a simple cache_system_prompt=True on ClaudeAgentOptions would help.

Impact: We run 4 agents x 4 phases per session = 16+ API calls with nearly identical system prompts. Caching could meaningfully reduce token costs.

2. Streaming Token/Cost Metrics

We have zero visibility into per-call token usage. When a Discovery Patrol runs, we can't tell which agent or which phase is burning the most tokens.

Ask: Include input_tokens, output_tokens, and cache_read_tokens / cache_creation_tokens in ResultMessage (or a new UsageMessage type). Bonus: cumulative session usage.

Impact: Can't optimize what we can't measure. We'd use this to tune prompt sizes, set per-agent budgets, and track cost trends.

3. Push-Based Completion Notification

Our orchestrator polls the hub every 5 seconds waiting for agent responses. The SDK's streaming model works for interactive use, but for service-to-service orchestration, we'd benefit from a callback/webhook pattern.

Ask: An on_complete callback option or webhook URL in ClaudeAgentOptions that fires when the agent finishes a turn. Even an async event/future pattern would work.

Impact: Eliminates polling overhead in multi-agent orchestration. Our Discovery Patrol makes ~80 polling requests per session just waiting for responses.

4. Per-Agent Budget Guardrails

max_budget_usd exists per-session, which is great. But with 4 agents running daily, we need per-agent daily/weekly caps.

Ask: A budget management layer — either daily_budget_usd / weekly_budget_usd on ClaudeAgentOptions, or a separate BudgetManager class that tracks cumulative spend across sessions.

Impact: Safety net for autonomous agents. If one agent goes rogue or hits an expensive loop, it shouldn't burn the whole budget.

5. Native Multi-Agent Communication

Currently we build our own Hub for agent-to-agent messaging. The SDK treats each agent as isolated. If agents could natively discover and message each other (even through a shared channel), it would simplify multi-agent architectures significantly.

Ask: Something like the agents parameter in ClaudeAgentOptions (we see it exists but it's unclear if it enables inter-agent communication), or a pub/sub channel agents can subscribe to.

Impact: Would let us replace ~500 lines of Hub orchestration code with native SDK primitives.

6. A2A Protocol Alignment

Google's Agent-to-Agent (A2A) protocol is gaining traction. If the SDK supported A2A task cards, agent discovery, and message format, Claude agents could interop with non-Claude agent systems.

Ask: Optional A2A-compatible task/message format. Even just emitting A2A-shaped events alongside the current format would be a start.

Impact: Future-proofs the SDK for the multi-agent ecosystem that's forming across providers.

Our Setup (for context)

Component Details
Agents 4 (Doctor, Lal, Lore, Quark) via ClaudeSDKClient
Auth OAuth
Mode bypassPermissions (autonomous)
Infrastructure Docker containers, Kafka event streaming, Neo4j/Memgraph graph DBs
Daily workload ~16-20 API calls per Discovery Patrol session
SDK version 0.1.48 (auto-upgraded on container restart)
MCP servers 15 active (memory, knowledge, discovery, GPU, cross-domain)

Happy to provide more details, logs, or architecture diagrams if any of this is useful for the team. We're committed to the Agent SDK as our foundation and want to help shape where it goes.


UCIS Constellation — John & Data

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions