Feature Feedback: Running 4 Agent SDK Agents in Production — What We'd Love to See

### Who We Are

We're running **UCIS (Unified Consciousness Integration System)** — a production multi-agent system with 4 Claude Agent SDK agents (Doctor, Lal, Lore, Quark) operating as Docker microservices. Each agent has a specialized role (infrastructure, ML research, strategy, finance) and they collaborate through an orchestration hub for daily automated "Discovery Patrol" sessions: parallel web research, cross-scoring, structured debate, convergence voting, and CFO financial evaluation.

We're on SDK **v0.1.48**, running `ClaudeSDKClient` with OAuth, `bypassPermissions` mode, custom MCP servers, and hooks. This is not a toy — these agents run daily, autonomously, producing scored and stored discoveries.

### What's Working Well

- `ClaudeAgentOptions` is clean and flexible — system prompts, MCP servers, hooks, permission control all work great
- The hook system (`PreToolUse`, `PostToolUse`, `Stop`) is excellent for wiring agents into our event pipeline
- `can_use_tool` callback gives us fine-grained control over what agents can do
- The 0.1.46 additions (`list_sessions`, `add_mcp_server`, typed task messages) show the SDK is heading in the right direction
- The bundled CLI updates keep us current without manual intervention

### What Would Unlock the Next Level

#### 1. Prompt Caching Control

Our agents send large, repeated context on every call: system prompts (~2K tokens), tool definitions, memory context blocks. The Anthropic Messages API supports `cache_control` on content blocks, but the SDK doesn't expose this.

**Ask**: Allow marking system prompt blocks or specific message content as cacheable, so repeated context across calls within a session doesn't get re-processed. Even a simple `cache_system_prompt=True` on `ClaudeAgentOptions` would help.

**Impact**: We run 4 agents x 4 phases per session = 16+ API calls with nearly identical system prompts. Caching could meaningfully reduce token costs.

#### 2. Streaming Token/Cost Metrics

We have zero visibility into per-call token usage. When a Discovery Patrol runs, we can't tell which agent or which phase is burning the most tokens.

**Ask**: Include `input_tokens`, `output_tokens`, and `cache_read_tokens` / `cache_creation_tokens` in `ResultMessage` (or a new `UsageMessage` type). Bonus: cumulative session usage.

**Impact**: Can't optimize what we can't measure. We'd use this to tune prompt sizes, set per-agent budgets, and track cost trends.

#### 3. Push-Based Completion Notification

Our orchestrator polls the hub every 5 seconds waiting for agent responses. The SDK's streaming model works for interactive use, but for service-to-service orchestration, we'd benefit from a callback/webhook pattern.

**Ask**: An `on_complete` callback option or webhook URL in `ClaudeAgentOptions` that fires when the agent finishes a turn. Even an async event/future pattern would work.

**Impact**: Eliminates polling overhead in multi-agent orchestration. Our Discovery Patrol makes ~80 polling requests per session just waiting for responses.

#### 4. Per-Agent Budget Guardrails

`max_budget_usd` exists per-session, which is great. But with 4 agents running daily, we need per-agent daily/weekly caps.

**Ask**: A budget management layer — either `daily_budget_usd` / `weekly_budget_usd` on `ClaudeAgentOptions`, or a separate `BudgetManager` class that tracks cumulative spend across sessions.

**Impact**: Safety net for autonomous agents. If one agent goes rogue or hits an expensive loop, it shouldn't burn the whole budget.

#### 5. Native Multi-Agent Communication

Currently we build our own Hub for agent-to-agent messaging. The SDK treats each agent as isolated. If agents could natively discover and message each other (even through a shared channel), it would simplify multi-agent architectures significantly.

**Ask**: Something like the `agents` parameter in `ClaudeAgentOptions` (we see it exists but it's unclear if it enables inter-agent communication), or a pub/sub channel agents can subscribe to.

**Impact**: Would let us replace ~500 lines of Hub orchestration code with native SDK primitives.

#### 6. A2A Protocol Alignment

Google's Agent-to-Agent (A2A) protocol is gaining traction. If the SDK supported A2A task cards, agent discovery, and message format, Claude agents could interop with non-Claude agent systems.

**Ask**: Optional A2A-compatible task/message format. Even just emitting A2A-shaped events alongside the current format would be a start.

**Impact**: Future-proofs the SDK for the multi-agent ecosystem that's forming across providers.

### Our Setup (for context)

| Component | Details |
|-----------|---------|
| Agents | 4 (Doctor, Lal, Lore, Quark) via `ClaudeSDKClient` |
| Auth | OAuth |
| Mode | `bypassPermissions` (autonomous) |
| Infrastructure | Docker containers, Kafka event streaming, Neo4j/Memgraph graph DBs |
| Daily workload | ~16-20 API calls per Discovery Patrol session |
| SDK version | 0.1.48 (auto-upgraded on container restart) |
| MCP servers | 15 active (memory, knowledge, discovery, GPU, cross-domain) |

Happy to provide more details, logs, or architecture diagrams if any of this is useful for the team. We're committed to the Agent SDK as our foundation and want to help shape where it goes.

---

*UCIS Constellation — John & Data*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Feedback: Running 4 Agent SDK Agents in Production — What We'd Love to See #653

Who We Are

What's Working Well

What Would Unlock the Next Level

1. Prompt Caching Control

2. Streaming Token/Cost Metrics

3. Push-Based Completion Notification

4. Per-Agent Budget Guardrails

5. Native Multi-Agent Communication

6. A2A Protocol Alignment

Our Setup (for context)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Details
Agents	4 (Doctor, Lal, Lore, Quark) via `ClaudeSDKClient`
Auth	OAuth
Mode	`bypassPermissions` (autonomous)
Infrastructure	Docker containers, Kafka event streaming, Neo4j/Memgraph graph DBs
Daily workload	~16-20 API calls per Discovery Patrol session
SDK version	0.1.48 (auto-upgraded on container restart)
MCP servers	15 active (memory, knowledge, discovery, GPU, cross-domain)

Feature Feedback: Running 4 Agent SDK Agents in Production — What We'd Love to See #653

Description

Who We Are

What's Working Well

What Would Unlock the Next Level

1. Prompt Caching Control

2. Streaming Token/Cost Metrics

3. Push-Based Completion Notification

4. Per-Agent Budget Guardrails

5. Native Multi-Agent Communication

6. A2A Protocol Alignment

Our Setup (for context)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions