From cf4a0dd47aa6c38184d31d64776497df04166a79 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 11:22:00 -0400 Subject: [PATCH 01/26] docs: design for unified harness tracing/message-emitting surface Approach A (Agentex event stream as canonical source of truth): one tap per harness feeds shared yield/auto-send delivery adapters and a span-deriving tracing tap. Additive backwards-compat, stacked PRs <1000 lines, conformance + live-matrix testing (3 test agents per harness: sync/async/temporal). Co-Authored-By: Claude Opus 4.8 (1M context) --- ...26-06-18-unified-harness-surface-design.md | 204 ++++++++++++++++++ 1 file changed, 204 insertions(+) create mode 100644 docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md diff --git a/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md b/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md new file mode 100644 index 000000000..8e5411863 --- /dev/null +++ b/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md @@ -0,0 +1,204 @@ +# Unified Harness Tracing / Message-Emitting Surface + +Date: 2026-06-18 +Status: Approved design, pending implementation +Repo: `scale-agentex-python` + +## Problem + +The SDK integrates several agent harnesses (pydantic-ai, LangGraph, OpenAI Agents) by +converting each harness's native output into Agentex `StreamTaskMessage*` events. Today +that integration is triplicated per harness: + +- `__sync.py` — a converter that **yields** Agentex events back over the + HTTP/JSON-RPC response (sync ACP agents). +- `__async.py` — a converter that **auto-sends** to the task stream (Redis via + `adk.streaming`) for async + temporal agents. +- `__tracing.py` — a separate, opt-in tracing handler wired into the converter + by hand. + +Consequences: + +- The native-output → Agentex-event mapping exists in two places per harness (sync and + async) and can drift. +- Tracing is bolted on per harness and is inconsistent across harnesses. +- There is no shared notion of a tool/reasoning span tree or turn-level metadata. +- The golden agent grew a parallel "harness layer" (a neutral `HarnessEvent` vocabulary + plus an adapter that drives `adk.streaming` + `adk.tracing`) to solve the same problem + for its subprocess CLI harnesses (claude-code, codex). That logic is valuable but lives + outside the SDK. + +## Goal (end state) + +pydantic-ai, LangGraph, OpenAI Agents, claude-code, and codex all emit through one unified +surface. A single pass over a harness's output drives **streaming, message persistence, and +tracing** from one source of truth, in the same shape as Agentex events. The surface works +for **both** delivery channels (sync yield, async/temporal auto-send). Tracing is on by +default and overridable. The claude-code/codex *parsers* live in the SDK; their sandbox / +secret / MCP orchestration stays in the golden agent. + +## Approach: Agentex event stream is canonical (Approach A) + +The Agentex `StreamTaskMessage*` stream is the single source of truth. Each harness maps its +native output to that stream **once**. A single emitter consumes that one stream and fans it +out to delivery (yield or auto-send) and to tracing (spans derived from the same stream). + +We considered two alternatives and rejected them: + +- **Neutral `AgentEvent` vocabulary + dual projectors (Approach B):** richer (carries turn + usage/cost natively, clean start/end pairing) but reintroduces a parallel vocabulary to + keep in sync with Agentex types, for the same outcome. +- **Push-to-sink with typed emitter methods (Approach C):** very testable, but the *yield* + delivery channel fights a push API (needs a queue/generator bridge), and sync ACP agents + depend on yield. + +Approach A matches "same shape as Agentex events" most directly, makes the yield channel +free, and lets us delete the per-harness tracing code by deriving spans from the canonical +stream. + +## Components + +Four shared, harness-independent components plus one thin tap per harness. + +### 1. Per-harness tap (the only per-harness code) + +``` +convert__to_agentex_events(native_stream, ...) -> AsyncIterator[StreamTaskMessage*] +``` + +The existing sync converters (`convert_pydantic_ai_to_agentex_events`, +`convert_langgraph_to_agentex_events`, `convert_openai_to_agentex_events`) already have this +shape and *become* the taps. New taps: `convert_claude_code_to_agentex_events`, +`convert_codex_to_agentex_events` (pure parsers over the CLIs' newline-delimited +stream-json; no SGP/sandbox coupling). + +### 2. Auto-send adapter (shared) + +Consumes the canonical Agentex stream and drives `adk.streaming` context managers: open/close +text and reasoning contexts, switch cleanly between them, stream tool request/response. This +generalizes the golden agent's `AgentexStreamAdapter` and replaces the N hand-written +`_async` bodies with one. Returns the accumulated final text (preserving current +auto-send return values). + +### 3. Yield adapter (shared) + +Passes the canonical stream through to the caller (sync HTTP ACP), tee-ing each event to the +tracer as a side effect. + +### 4. Tracing tap (shared) + +Derives spans from the canonical stream: + +- tool span = `ToolRequestContent` (start/full) → matching `ToolResponseContent` by + `tool_call_id`. +- reasoning span = reasoning start → done. +- subagent span = the Task/Agent tool's span (a tool span by another name). + +Default-on whenever a trace context exists; **overridable** by passing a custom tracer, or +`None` to disable. Replaces the per-harness `_tracing.py` handlers. + +### Facade + +A `UnifiedEmitter` ties the chosen delivery adapter and the tracer together so an agent +author calls one thing. + +### Proposed layout + +- Shared components: `src/agentex/lib/core/harness/` (delivery adapters, tracing tap, span + derivation, facade). +- Taps: remain in `src/agentex/lib/adk/_modules/`. +- Public access: via the `adk` facade. + +## Data flow + +One pass over the canonical stream, fanned out by delivery mode. + +- **Sync agent:** `async for ev in emitter.yield_events(convert_X(native)): ...` — the tracer + observes each event; the event is yielded over the HTTP/JSON-RPC response. +- **Async + temporal agent:** `await emitter.auto_send(convert_X(native), task_id=...)` — the + auto-send adapter pushes deltas to Redis via `adk.streaming` while the tracer observes the + same events; returns accumulated final text. Temporal is identical, called from inside an + activity (converters run in activities, not workflows, so determinism is not a concern). +- **Tracing** is the same derivation in both modes (it observes the canonical stream), so + sync and auto-send produce identical spans. +- **Turn-level metadata** (usage / cost / model) is not an Agentex event. It rides a small + side-channel: the tap returns a final typed `TurnResult` (or yields a terminal record) + that the caller attaches to the turn span. This mirrors how the golden agent already treats + `TurnCompleted` as "handled by the caller, not the stream." + +Net dedup: **3 files × N harnesses → 1 tap × N harnesses + 3 shared components.** + +## Backwards compatibility (every change is additive) + +The end state "replaces" the old converters, but it is reached additively. No public symbol +is removed in this stack; nothing regresses. + +- **Taps:** existing `convert_*_to_agentex_events` keep exact signatures and output. Behavior + is unchanged when no trace context is present. +- **Auto-send entry points** (`stream_langgraph_events(stream, task_id)`, the pydantic/openai + `_async` helpers, `run_agent_streamed_auto_send`, `chat_completion_stream_auto_send`) keep + signatures and return values, reimplemented to delegate to the shared auto-send adapter. + Feature-add: they emit traces by default. The conformance suite asserts equivalent Redis + messages before/after. +- **`_tracing.py` handlers** stay importable as shims; the shared tracer supersedes them + internally. +- **Removal/deprecation** of dead internal duplication is the final PR, behind a deprecation + note, never mixed into a migration PR. + +## Rollout — stacked PRs (each < 1000 lines diff) + +1. **Span derivation (`TracingTap`)** — pure function: canonical stream → spans. + Unit-tested in isolation. No wiring. +2. **Auto-send adapter** — canonical stream → `adk.streaming` side effects. Fixture-tested. + Not yet wired into harnesses. +3. **Yield adapter + `UnifiedEmitter` facade + public `adk` surface** — plus the + conformance-test scaffold (fixture format + parametrized runner) and an empty CI + integration job. +4. **Migrate pydantic-ai** — reimplement its `_async` / tracing on the shared components; + keep `convert_pydantic_ai_to_agentex_events` signature; default tracing on. Add 3 test + agents (sync / async / temporal) + CI matrix entries + live smoke. +5. **Migrate LangGraph** — same pattern + 3 test agents + CI. +6. **Migrate OpenAI Agents** — same pattern + 3 test agents + CI. +7. **claude-code parser tap** — `convert_claude_code_to_agentex_events` + recorded stream-json + fixtures + feasible test agent(s) (likely temporal-only, given the sandbox requirement). +8. **codex parser tap** — same shape + fixtures + feasible test agent(s). +9. **Cleanup** — delete now-dead internal duplication, deprecate shims, docs. + +## Testing + +### Offline conformance suite (every PR) + +Committed raw harness outputs (pydantic `AgentStreamEvent`s, LangGraph chunks, OpenAI stream, +claude/codex stream-json) drive a shared parametrized suite. For each fixture, assert: + +- exact normalized `StreamTaskMessage*` sequence, +- derived span tree, +- **yield-vs-auto-send equivalence** — both channels produce the same logical events/spans. + +Every tap must pass the shared cases: text, reasoning, single tool, tool error, multi-step, +and interleaved reasoning + tool ordering. Deterministic, offline, no network. + +### Live integration matrix (CI) + +Three test agents per harness, one per agent type (sync / async / temporal), deployed and +driven with a fixed prompt. Assert the unified surface produced valid ordered messages and a +well-formed span tree. Modeled on the existing `agentex-tutorials-test.yml` / +`build-and-push-tutorial-agent.yml` CI precedent. + +Matrix: harness ∈ {pydantic-ai, langgraph, openai-agents, claude-code, codex} × agent-type ∈ +{sync, async, temporal}. claude-code/codex run the subset of agent types that is feasible; +any uncovered cell is logged/documented, never silently skipped. + +### Error handling + +- A tap that raises mid-stream closes open streaming contexts and open spans — no leaked + `adk.streaming` context, no dangling span. +- Tracing failures are best-effort and never break delivery (matches the golden agent's + contract). + +## Out of scope + +- Sandbox pool, sandbox lifecycle, MCP server provisioning, and OAuth/secret reauth — tracked + separately; only the pure claude-code/codex output parsers are in scope here. +- claude-code/codex sandbox / secret / MCP orchestration — stays in the golden agent and + feeds the SDK parser. From 4538544806c518e1a9be15cb711bfae596c8f882 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 11:30:03 -0400 Subject: [PATCH 02/26] =?UTF-8?q?docs:=20refine=20unified=20harness=20spec?= =?UTF-8?q?=20=E2=80=94=20span=20derivation=20rules,=20TurnUsage,=20golden?= =?UTF-8?q?-agent=20integration?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Make tracing-tap span derivation explicit (tool open on Done of a ToolRequestContent index, close on matching ToolResponseContent by tool_call_id; parallel-safe; reasoning start->done). Flag missing is_error on ToolResponseContent as an additive upstream decision. - Add first-class TurnUsage/TurnResult shape (aligned to llm_metrics token taxonomy) attached to the turn span via span(data=) and reused for metrics. - Document golden-agent integration: all SGP/sandbox/secret/MCP coupling stays in the agent; only parsing/streaming/tracing/usage move to SDK taps + emitter; sandbox-setup events chain before the harness stream. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...26-06-18-unified-harness-surface-design.md | 124 ++++++++++++++++-- 1 file changed, 116 insertions(+), 8 deletions(-) diff --git a/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md b/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md index 8e5411863..c3b54c117 100644 --- a/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md +++ b/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md @@ -87,16 +87,34 @@ tracer as a side effect. ### 4. Tracing tap (shared) -Derives spans from the canonical stream: +A stateful reducer that derives spans from the canonical stream. It only *observes* +`index` and `tool_call_id`; it never mutates or reorders the stream, so streaming fidelity +is unchanged. -- tool span = `ToolRequestContent` (start/full) → matching `ToolResponseContent` by +Derivation rules: + +- **Tool span open:** on `StreamTaskMessageDone` for an index whose `Start` content was a + `ToolRequestContent`. Arguments are fully known by `Done` (covers both streamed-args and + one-shot tools). The open span is keyed by `tool_call_id`. +- **Tool span close:** on `StreamTaskMessageFull(ToolResponseContent)` matching by `tool_call_id`. -- reasoning span = reasoning start → done. -- subagent span = the Task/Agent tool's span (a tool span by another name). +- **Parallel / interleaved tools:** `ToolRequestContent`, `ToolResponseContent`, + `ToolRequestDelta`, and `ToolResponseDelta` all carry `tool_call_id` + `name`, so multiple + open tool spans pair correctly regardless of arrival order. +- **Reasoning span:** `Start(ReasoningContent)` → `Done` on that index. +- **Subagent span:** the Task/Agent tool's span (a tool span by another name), nested under + the turn span. Default-on whenever a trace context exists; **overridable** by passing a custom tracer, or `None` to disable. Replaces the per-harness `_tracing.py` handlers. +**Open decision — tool error status.** `ToolResponseContent` currently has no +`is_error`/`status` field (only `content`), so a derived tool span cannot mark failure. The +golden agent's `ToolCompleted` carried `is_error`. Recommended resolution: add an additive +optional `is_error: bool | None` to `ToolResponseContent`. This is a generated type, so it is +a small upstream API-spec change (tracked as a prerequisite to the relevant migration PR), not +a local edit. Until it lands, derived spans omit tool error status rather than inferring it. + ### Facade A `UnifiedEmitter` ties the chosen delivery adapter and the tracer together so an agent @@ -121,13 +139,66 @@ One pass over the canonical stream, fanned out by delivery mode. activity (converters run in activities, not workflows, so determinism is not a concern). - **Tracing** is the same derivation in both modes (it observes the canonical stream), so sync and auto-send produce identical spans. -- **Turn-level metadata** (usage / cost / model) is not an Agentex event. It rides a small - side-channel: the tap returns a final typed `TurnResult` (or yields a terminal record) - that the caller attaches to the turn span. This mirrors how the golden agent already treats - `TurnCompleted` as "handled by the caller, not the stream." +- **Turn-level metadata** (usage / cost / model) is not an Agentex event, so it is surfaced + as a first-class `TurnUsage` shape rather than ad-hoc data (see below). Net dedup: **3 files × N harnesses → 1 tap × N harnesses + 3 shared components.** +## Unified turn usage / cost + +Turn metadata is a first-class, harness-independent shape attached to the turn span and +returned to the caller — not a loose side-channel. + +``` +class TurnUsage(BaseModel): + model: str | None + input_tokens: int | None + output_tokens: int | None + cached_input_tokens: int | None # subset of input_tokens served from cache + reasoning_tokens: int | None # subset of output_tokens + total_tokens: int | None + cost_usd: float | None + duration_ms: int | None # wall-clock, measured by the emitter + num_llm_calls: int + num_tool_calls: int # derived from the canonical stream + num_reasoning_blocks: int # derived from the canonical stream + +class TurnResult(BaseModel): + final_text: str + usage: TurnUsage +``` + +- Token field names align with the existing `agentex.lib.core.observability.llm_metrics` + taxonomy (`input_tokens` / `output_tokens` / `cached_input_tokens` / `reasoning_tokens`), + not a new vocabulary. (The OpenAI-style `llm_messages.Usage` — + `prompt_tokens`/`completion_tokens` — is mapped into this richer shape.) +- **Each harness tap normalizes its native usage** into `TurnUsage`: pydantic-ai + `result.usage()`, LangGraph `usage_metadata`, OpenAI `response.usage`, claude-code/codex + the final `result` envelope (`cost_usd` + usage). Per-harness normalization, one output + shape. +- The stream-derived counts (`num_tool_calls`, `num_reasoning_blocks`) come for free from the + tracing tap's reduction; `duration_ms` is measured by the emitter; tokens/cost/model come + from the tap's native-usage normalization. +- The emitter attaches `TurnUsage` to the **turn span** via `adk.tracing.span(data=...)` + (which already accepts a `BaseModel`) and returns `TurnResult` to the caller. The same + object can feed the OTel `LLMMetrics` and downstream metrics (e.g. the golden agent's + per-turn DogStatsD emission), so traces and metrics share one shape. + +### Surfacing `TurnUsage` from the tap + +Python async generators cannot cleanly return a value to their consumer, so the tap does not +return `TurnUsage` via `StopAsyncIteration`. Instead the per-harness entry is a small object: + +``` +class HarnessTurn: + events: AsyncIterator[StreamTaskMessage*] # the canonical stream + def usage(self) -> TurnUsage # populated once `events` is exhausted +``` + +The emitter drives `events` (delivering + tracing), then reads `usage()` to finalize the turn +span and build `TurnResult`. This keeps the canonical stream pure (only `StreamTaskMessage*`) +while giving usage/cost a typed home. + ## Backwards compatibility (every change is additive) The end state "replaces" the old converters, but it is reached additively. No public symbol @@ -196,6 +267,43 @@ any uncovered cell is logged/documented, never silently skipped. - Tracing failures are best-effort and never break delivery (matches the golden agent's contract). +## Golden agent integration (SGP / sandbox coupling preserved) + +The unified surface is designed so the golden agent keeps **all** of its SGP-coupled layers +and only swaps its hand-rolled parsing/streaming/tracing internals for the SDK's taps + +emitter. Nothing SGP-specific moves into the SDK. + +What stays in the golden agent, untouched: + +- Sandbox pool acquire modes (cold-create / warm-claim / reconnect), lease coordination, and + the data-plane URL override. +- Secret resolution, OAuth/MCP reauth, and reconnect-notice emission (the notice is just + another standalone message on the task stream, independent of the harness tap). +- Spawning `claude -p` / `codex exec` inside the sandbox. + +What changes inside the golden agent's provider: + +1. Acquire/provision the sandbox and resolve secrets/MCP exactly as today (SGP-coupled). +2. Spawn the CLI in the sandbox and feed its stdout (stream-json lines) into the SDK tap + `convert_claude_code_to_agentex_events` / `convert_codex_to_agentex_events`. +3. Run that tap through the SDK emitter's **auto-send** path from inside the existing Temporal + activity, getting streaming + tracing + `TurnUsage` for free. The agent's + `_StreamJsonProcessor` and `AgentexStreamAdapter` are retired in favor of the SDK tap + + emitter. + +**Sandbox-setup events:** today the golden agent surfaces provisioning steps (reconnect / +find / create / configure-git / clone) as UI tool calls by yielding them into the same +adapter. Under the unified surface these become agent-produced `ToolRequestContent` / +`ToolResponseContent` messages, chained *before* the harness tap's stream into one canonical +stream for the turn (`chain(setup_events, convert_claude_code(stdout))`). The emitter then +delivers and traces the whole turn uniformly, so setup steps keep appearing in the UI and the +span tree. + +This means the claude-code/codex parser PRs (7, 8) deliver the SDK taps, and a corresponding +**golden-agent-side change** (out of this repo's PR stack) rewires its providers onto them. +The golden agent's in-process litellm / OpenAI-Agents harness can likewise adopt the OpenAI +tap, though that is optional and not required by this design. + ## Out of scope - Sandbox pool, sandbox lifecycle, MCP server provisioning, and OAuth/secret reauth — tracked From 4573d18f5bc65553d648e230bdb5bbfaef858044 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 11:34:38 -0400 Subject: [PATCH 03/26] docs: link deferred tool-error decision to AGX1-371 Co-Authored-By: Claude Opus 4.8 (1M context) --- .../specs/2026-06-18-unified-harness-surface-design.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md b/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md index c3b54c117..e8a32f112 100644 --- a/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md +++ b/docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md @@ -112,8 +112,9 @@ Default-on whenever a trace context exists; **overridable** by passing a custom `is_error`/`status` field (only `content`), so a derived tool span cannot mark failure. The golden agent's `ToolCompleted` carried `is_error`. Recommended resolution: add an additive optional `is_error: bool | None` to `ToolResponseContent`. This is a generated type, so it is -a small upstream API-spec change (tracked as a prerequisite to the relevant migration PR), not -a local edit. Until it lands, derived spans omit tool error status rather than inferring it. +a small upstream API-spec change, not a local edit. **Deferred** — tracked in Linear as +AGX1-371 (Agentex "Starter Tasks"). Until it lands, derived spans omit tool error status +rather than inferring it. ### Facade From 8eda56dc21c11cd5f341b2f641dc6c4959f056c9 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 11:40:42 -0400 Subject: [PATCH 04/26] docs: foundation implementation plan for unified harness surface (PRs 1-3) Bite-sized TDD tasks: foundation types, pure SpanDeriver, SpanTracer adapter, yield + auto_send delivery, UnifiedEmitter facade, conformance scaffold + CI job. Migration/parser PRs (4-9) listed as follow-on plans. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...6-18-unified-harness-surface-foundation.md | 1309 +++++++++++++++++ 1 file changed, 1309 insertions(+) create mode 100644 docs/superpowers/plans/2026-06-18-unified-harness-surface-foundation.md diff --git a/docs/superpowers/plans/2026-06-18-unified-harness-surface-foundation.md b/docs/superpowers/plans/2026-06-18-unified-harness-surface-foundation.md new file mode 100644 index 000000000..0aefef060 --- /dev/null +++ b/docs/superpowers/plans/2026-06-18-unified-harness-surface-foundation.md @@ -0,0 +1,1309 @@ +# Unified Harness Surface — Foundation Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Build the shared, harness-independent machinery (span derivation, auto-send delivery, yield delivery, unified emitter, turn-usage types) that the per-harness taps will plug into — corresponding to PRs 1–3 of the design's rollout. + +**Architecture:** The Agentex `StreamTaskMessage*` stream is the single source of truth (design Approach A). A pure `SpanDeriver` reduces that stream into open/close span signals. Two delivery adapters consume the same stream — `yield_events` (sync HTTP ACP) and `auto_send` (async/temporal, via `adk.streaming`) — and both observe the deriver to drive `adk.tracing`. A `UnifiedEmitter` ties delivery + tracing + `TurnUsage` together. + +**Tech Stack:** Python 3, pydantic v2 (`BaseModel`), pytest + pytest-asyncio, the existing `agentex.lib.adk` streaming/tracing facades. + +**Spec:** `docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md` + +**Scope note:** This plan covers only the foundation (PRs 1–3). The per-harness migration PRs (4–6: pydantic-ai, langgraph, openai) and parser PRs (7–8: claude-code, codex) each require close reading of that harness's existing converter and get their own plans once this foundation lands. PR 9 (cleanup) follows them. See "Subsequent plans" at the end. + +--- + +## File Structure + +- Create `src/agentex/lib/core/harness/__init__.py` — package marker + public re-exports. +- Create `src/agentex/lib/core/harness/types.py` — `OpenSpan`, `CloseSpan`, `SpanSignal`, `TurnUsage`, `TurnResult`, `HarnessTurn` protocol. +- Create `src/agentex/lib/core/harness/span_derivation.py` — `SpanDeriver` (pure reducer). +- Create `src/agentex/lib/core/harness/auto_send.py` — `auto_send()` (canonical stream → `adk.streaming` + tracing). +- Create `src/agentex/lib/core/harness/yield_delivery.py` — `yield_events()` (passthrough + tracing). +- Create `src/agentex/lib/core/harness/emitter.py` — `UnifiedEmitter` facade. +- Create tests under `tests/lib/core/harness/`. + +Each file has one responsibility; `span_derivation.py` has zero dependencies on `adk` so it is unit-testable in isolation. + +--- + +## Task 1: Foundation types + +**Files:** +- Create: `src/agentex/lib/core/harness/__init__.py` +- Create: `src/agentex/lib/core/harness/types.py` +- Test: `tests/lib/core/harness/test_types.py` + +- [ ] **Step 1: Create the package marker** + +Create `src/agentex/lib/core/harness/__init__.py`: + +```python +"""Shared, harness-independent machinery for the unified harness surface. + +The Agentex StreamTaskMessage* stream is the single source of truth; this +package derives spans from it and delivers it (yield or auto-send), so every +harness tap gets streaming + tracing + turn usage uniformly. +""" +``` + +- [ ] **Step 2: Write the failing test for the types** + +Create `tests/lib/core/harness/__init__.py` (empty) and `tests/lib/core/harness/test_types.py`: + +```python +from agentex.lib.core.harness.types import ( + OpenSpan, + CloseSpan, + TurnUsage, + TurnResult, +) + + +def test_open_close_span_construct(): + o = OpenSpan(key="call_1", kind="tool", name="Bash", input={"cmd": "ls"}) + c = CloseSpan(key="call_1", output="files", is_complete=True) + assert o.key == c.key == "call_1" + assert o.kind == "tool" + assert c.is_complete is True + + +def test_turn_usage_defaults_are_none(): + u = TurnUsage(model="claude-opus-4-6") + assert u.model == "claude-opus-4-6" + assert u.input_tokens is None + assert u.num_tool_calls == 0 + + +def test_turn_result_wraps_usage(): + r = TurnResult(final_text="hi", usage=TurnUsage(model="m")) + assert r.final_text == "hi" + assert r.usage.model == "m" +``` + +- [ ] **Step 3: Run test to verify it fails** + +Run: `pytest tests/lib/core/harness/test_types.py -v` +Expected: FAIL with `ModuleNotFoundError: agentex.lib.core.harness.types` + +- [ ] **Step 4: Implement the types** + +Create `src/agentex/lib/core/harness/types.py`: + +```python +"""Types for the unified harness surface.""" + +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import Any, AsyncIterator, Literal, Protocol, Union, runtime_checkable + +from agentex.types.task_message_update import ( + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, +) +from agentex.lib.utils.model_utils import BaseModel + +# The canonical stream element. Taps yield these; delivery adapters consume them. +StreamTaskMessage = Union[ + StreamTaskMessageStart, + StreamTaskMessageDelta, + StreamTaskMessageFull, + StreamTaskMessageDone, +] + +SpanKind = Literal["tool", "reasoning", "subagent"] + + +@dataclass +class OpenSpan: + """Signal to open a child span. `key` pairs an open with its close.""" + + key: str + kind: SpanKind + name: str + input: dict[str, Any] = field(default_factory=dict) + + +@dataclass +class CloseSpan: + """Signal to close the span previously opened with the same `key`.""" + + key: str + output: Any = None + is_complete: bool = True # False when closed by flush() without a result + + +SpanSignal = Union[OpenSpan, CloseSpan] + + +class TurnUsage(BaseModel): + """Harness-independent turn usage/cost, attached to the turn span. + + Token field names align with agentex.lib.core.observability.llm_metrics. + """ + + model: str | None = None + input_tokens: int | None = None + output_tokens: int | None = None + cached_input_tokens: int | None = None + reasoning_tokens: int | None = None + total_tokens: int | None = None + cost_usd: float | None = None + duration_ms: int | None = None + num_llm_calls: int = 0 + num_tool_calls: int = 0 + num_reasoning_blocks: int = 0 + + +class TurnResult(BaseModel): + """Returned to the caller after a turn is delivered.""" + + final_text: str = "" + usage: TurnUsage = TurnUsage() + + +@runtime_checkable +class HarnessTurn(Protocol): + """A single harness turn: a canonical stream plus its normalized usage. + + Python async generators cannot cleanly return a value to their consumer, so + a tap exposes usage via `usage()` (valid only after `events` is exhausted) + rather than via StopAsyncIteration. + """ + + @property + def events(self) -> AsyncIterator[StreamTaskMessage]: ... + + def usage(self) -> TurnUsage: ... +``` + +- [ ] **Step 5: Run test to verify it passes** + +Run: `pytest tests/lib/core/harness/test_types.py -v` +Expected: PASS (3 passed) + +- [ ] **Step 6: Commit** + +```bash +git add src/agentex/lib/core/harness/__init__.py src/agentex/lib/core/harness/types.py tests/lib/core/harness/__init__.py tests/lib/core/harness/test_types.py +git commit -m "feat(harness): foundation types for unified harness surface" +``` + +--- + +## Task 2: SpanDeriver (pure span derivation) — PR 1 + +**Files:** +- Create: `src/agentex/lib/core/harness/span_derivation.py` +- Test: `tests/lib/core/harness/test_span_derivation.py` + +Derivation rules (from the spec): tool span opens on the `Done` of an index whose `Start` +was a `ToolRequestContent`, and closes on the matching `ToolResponseContent` by +`tool_call_id`; reasoning span opens on `Start(ReasoningContent)` and closes on that index's +`Done`. Parallel tools are keyed by `tool_call_id`. `flush()` closes anything still open. + +- [ ] **Step 1: Write failing tests (text, single tool, reasoning, parallel, streamed args, unclosed)** + +Create `tests/lib/core/harness/test_span_derivation.py`: + +```python +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.types import OpenSpan, CloseSpan +from agentex.types.task_message_update import ( + StreamTaskMessageStart, + StreamTaskMessageDelta, + StreamTaskMessageFull, + StreamTaskMessageDone, +) +from agentex.types.text_content import TextContent +from agentex.types.reasoning_content import ReasoningContent +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent +from agentex.types.tool_request_delta import ToolRequestDelta + + +def _signals(deriver, events): + out = [] + for e in events: + out.extend(deriver.observe(e)) + out.extend(deriver.flush()) + return out + + +def _tool_req(idx, tcid, name, args): + return StreamTaskMessageStart( + type="start", index=idx, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id=tcid, name=name, arguments=args), + ) + + +def test_text_only_yields_no_spans(): + d = SpanDeriver() + events = [ + StreamTaskMessageStart(type="start", index=0, + content=TextContent(type="text", author="agent", content="")), + StreamTaskMessageDelta(type="delta", index=0, + delta=None), + StreamTaskMessageDone(type="done", index=0), + ] + assert _signals(d, events) == [] + + +def test_single_tool_opens_on_done_closes_on_response(): + d = SpanDeriver() + events = [ + _tool_req(0, "call_1", "Bash", {"cmd": "ls"}), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull(type="full", index=1, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="call_1", name="Bash", content="files")), + ] + sigs = _signals(d, events) + assert sigs == [ + OpenSpan(key="call_1", kind="tool", name="Bash", input={"cmd": "ls"}), + CloseSpan(key="call_1", output="files", is_complete=True), + ] + + +def test_reasoning_opens_on_start_closes_on_done(): + d = SpanDeriver() + events = [ + StreamTaskMessageStart(type="start", index=0, + content=ReasoningContent(type="reasoning", author="agent", summary=[], content=[])), + StreamTaskMessageDone(type="done", index=0), + ] + sigs = _signals(d, events) + assert sigs[0] == OpenSpan(key="reasoning:0", kind="reasoning", name="reasoning", input={}) + assert sigs[1] == CloseSpan(key="reasoning:0", output=None, is_complete=True) + + +def test_parallel_tools_pair_by_tool_call_id(): + d = SpanDeriver() + events = [ + _tool_req(0, "a", "T1", {}), + _tool_req(1, "b", "T2", {}), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageDone(type="done", index=1), + StreamTaskMessageFull(type="full", index=2, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="b", name="T2", content="rb")), + StreamTaskMessageFull(type="full", index=3, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="a", name="T1", content="ra")), + ] + sigs = _signals(d, events) + opens = [s for s in sigs if isinstance(s, OpenSpan)] + closes = [s for s in sigs if isinstance(s, CloseSpan)] + assert {o.key for o in opens} == {"a", "b"} + assert [c.key for c in closes] == ["b", "a"] + assert all(c.is_complete for c in closes) + + +def test_streamed_args_accumulate_into_open_input(): + d = SpanDeriver() + events = [ + StreamTaskMessageStart(type="start", index=0, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageDelta(type="delta", index=0, + delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", + arguments_delta='{"cmd":')), + StreamTaskMessageDelta(type="delta", index=0, + delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", + arguments_delta='"ls"}')), + StreamTaskMessageDone(type="done", index=0), + ] + sigs = _signals(d, events) + assert sigs[0] == OpenSpan(key="c", kind="tool", name="Bash", input={"cmd": "ls"}) + + +def test_unclosed_tool_closed_incomplete_on_flush(): + d = SpanDeriver() + events = [ + _tool_req(0, "x", "Bash", {}), + StreamTaskMessageDone(type="done", index=0), + ] + sigs = _signals(d, events) + assert sigs[0] == OpenSpan(key="x", kind="tool", name="Bash", input={}) + assert sigs[1] == CloseSpan(key="x", output=None, is_complete=False) +``` + +- [ ] **Step 2: Run tests to verify they fail** + +Run: `pytest tests/lib/core/harness/test_span_derivation.py -v` +Expected: FAIL with `ModuleNotFoundError: agentex.lib.core.harness.span_derivation` + +- [ ] **Step 3: Implement `SpanDeriver`** + +Create `src/agentex/lib/core/harness/span_derivation.py`: + +```python +"""Pure reducer: canonical StreamTaskMessage* stream -> span open/close signals. + +Has no dependency on adk; unit-testable in isolation. Delivery adapters feed it +every event and act on the returned signals. +""" + +from __future__ import annotations + +import json +from dataclasses import dataclass, field +from typing import Any + +from agentex.types.task_message_update import ( + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, +) + +from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal, StreamTaskMessage + + +@dataclass +class _ToolReqMeta: + tool_call_id: str + name: str + arguments: dict[str, Any] + args_buf: str = "" # accumulated streamed argument fragments + + +class SpanDeriver: + """Stateful reducer over the canonical stream. + + Tool span: open on Done of a ToolRequestContent index; close on matching + ToolResponseContent by tool_call_id. Reasoning span: open on + Start(ReasoningContent); close on that index's Done. + """ + + def __init__(self) -> None: + # index -> tool request metadata (present only for tool_request indices) + self._tool_by_index: dict[int, _ToolReqMeta] = {} + # index -> reasoning open (present only for reasoning indices) + self._reasoning_index_open: set[int] = set() + # tool_call_ids with a currently-open span + self._open_tool_ids: set[str] = set() + + def observe(self, event: StreamTaskMessage) -> list[SpanSignal]: + if isinstance(event, StreamTaskMessageStart): + return self._on_start(event) + if isinstance(event, StreamTaskMessageDelta): + return self._on_delta(event) + if isinstance(event, StreamTaskMessageFull): + return self._on_full(event) + if isinstance(event, StreamTaskMessageDone): + return self._on_done(event) + return [] + + def flush(self) -> list[SpanSignal]: + """Close anything still open at end of stream, marked incomplete.""" + signals: list[SpanSignal] = [] + for tcid in list(self._open_tool_ids): + signals.append(CloseSpan(key=tcid, output=None, is_complete=False)) + self._open_tool_ids.clear() + for idx in sorted(self._reasoning_index_open): + signals.append(CloseSpan(key=f"reasoning:{idx}", output=None, is_complete=False)) + self._reasoning_index_open.clear() + return signals + + def _on_start(self, event: StreamTaskMessageStart) -> list[SpanSignal]: + content = event.content + idx = event.index if event.index is not None else -1 + ctype = getattr(content, "type", None) + if ctype == "tool_request": + self._tool_by_index[idx] = _ToolReqMeta( + tool_call_id=content.tool_call_id, + name=content.name, + arguments=dict(content.arguments or {}), + ) + return [] + if ctype == "reasoning": + self._reasoning_index_open.add(idx) + return [OpenSpan(key=f"reasoning:{idx}", kind="reasoning", name="reasoning", input={})] + return [] + + def _on_delta(self, event: StreamTaskMessageDelta) -> list[SpanSignal]: + idx = event.index if event.index is not None else -1 + delta = event.delta + if delta is not None and getattr(delta, "type", None) == "tool_request": + meta = self._tool_by_index.get(idx) + if meta is not None and delta.arguments_delta: + meta.args_buf += delta.arguments_delta + return [] + + def _on_full(self, event: StreamTaskMessageFull) -> list[SpanSignal]: + content = event.content + if getattr(content, "type", None) == "tool_response": + tcid = content.tool_call_id + if tcid in self._open_tool_ids: + self._open_tool_ids.discard(tcid) + return [CloseSpan(key=tcid, output=content.content, is_complete=True)] + return [] + + def _on_done(self, event: StreamTaskMessageDone) -> list[SpanSignal]: + idx = event.index if event.index is not None else -1 + meta = self._tool_by_index.pop(idx, None) + if meta is not None: + args = meta.arguments + if meta.args_buf: + try: + args = json.loads(meta.args_buf) + except json.JSONDecodeError: + args = {"_raw": meta.args_buf} + self._open_tool_ids.add(meta.tool_call_id) + return [OpenSpan(key=meta.tool_call_id, kind="tool", name=meta.name, input=args)] + if idx in self._reasoning_index_open: + self._reasoning_index_open.discard(idx) + return [CloseSpan(key=f"reasoning:{idx}", output=None, is_complete=True)] + return [] +``` + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `pytest tests/lib/core/harness/test_span_derivation.py -v` +Expected: PASS (6 passed) + +- [ ] **Step 5: Commit** + +```bash +git add src/agentex/lib/core/harness/span_derivation.py tests/lib/core/harness/test_span_derivation.py +git commit -m "feat(harness): pure SpanDeriver reducing the canonical stream to span signals" +``` + +--- + +## Task 3: Tracer adapter (span signals -> adk.tracing) + +**Files:** +- Create: `src/agentex/lib/core/harness/tracer.py` +- Test: `tests/lib/core/harness/test_tracer.py` + +A thin adapter that turns `SpanSignal`s into `adk.tracing` spans, nesting them under a parent +span. Kept separate from `SpanDeriver` so derivation stays pure and tracing stays overridable. +Tracing failures are best-effort and never raise (spec error-handling contract). + +- [ ] **Step 1: Write the failing test (uses a fake adk.tracing)** + +Create `tests/lib/core/harness/test_tracer.py`: + +```python +import pytest + +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import OpenSpan, CloseSpan + + +class _FakeSpan: + def __init__(self, name): + self.name = name + + +class _FakeTracing: + def __init__(self): + self.started = [] + self.ended = [] + + async def start_span(self, *, trace_id, name, input=None, parent_id=None, data=None, task_id=None): + self.started.append((name, parent_id, input)) + return _FakeSpan(name) + + async def end_span(self, *, trace_id, span, output=None, data=None): + self.ended.append((span.name, output)) + + +@pytest.mark.asyncio +async def test_open_then_close_starts_and_ends_span(): + fake = _FakeTracing() + tracer = SpanTracer(trace_id="t1", parent_span_id="p1", tracing=fake) + await tracer.handle(OpenSpan(key="call_1", kind="tool", name="Bash", input={"cmd": "ls"})) + await tracer.handle(CloseSpan(key="call_1", output="files", is_complete=True)) + assert fake.started == [("Bash", "p1", {"cmd": "ls"})] + assert fake.ended == [("Bash", "files")] + + +@pytest.mark.asyncio +async def test_no_trace_id_is_noop(): + fake = _FakeTracing() + tracer = SpanTracer(trace_id="", parent_span_id=None, tracing=fake) + await tracer.handle(OpenSpan(key="k", kind="tool", name="X")) + await tracer.handle(CloseSpan(key="k")) + assert fake.started == [] and fake.ended == [] + + +@pytest.mark.asyncio +async def test_tracing_failure_is_swallowed(): + class _Boom(_FakeTracing): + async def start_span(self, **kw): + raise RuntimeError("backend down") + + tracer = SpanTracer(trace_id="t1", parent_span_id="p1", tracing=_Boom()) + # Must not raise. + await tracer.handle(OpenSpan(key="k", kind="tool", name="X")) + await tracer.handle(CloseSpan(key="k")) +``` + +- [ ] **Step 2: Run tests to verify they fail** + +Run: `pytest tests/lib/core/harness/test_tracer.py -v` +Expected: FAIL with `ModuleNotFoundError: agentex.lib.core.harness.tracer` + +- [ ] **Step 3: Implement `SpanTracer`** + +Create `src/agentex/lib/core/harness/tracer.py`: + +```python +"""Adapter from SpanSignals to adk.tracing spans (best-effort, overridable).""" + +from __future__ import annotations + +from typing import Any + +from agentex.lib.utils.logging import make_logger +from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal + +logger = make_logger(__name__) + + +class SpanTracer: + """Opens/closes adk.tracing child spans in response to span signals. + + `tracing` defaults to the real `adk.tracing` module; inject a fake in tests + or a custom tracer to override. No-op when `trace_id` is falsy. Never raises. + """ + + def __init__(self, trace_id: str | None, parent_span_id: str | None, tracing: Any = None, task_id: str | None = None): + self.trace_id = trace_id + self.parent_span_id = parent_span_id + self.task_id = task_id + if tracing is None: + from agentex.lib import adk + + tracing = adk.tracing + self._tracing = tracing + self._open: dict[str, Any] = {} # span key -> span object + + async def handle(self, signal: SpanSignal) -> None: + if not self.trace_id: + return + try: + if isinstance(signal, OpenSpan): + span = await self._tracing.start_span( + trace_id=self.trace_id, + name=signal.name, + input=signal.input, + parent_id=self.parent_span_id, + task_id=self.task_id, + ) + if span is not None: + self._open[signal.key] = span + elif isinstance(signal, CloseSpan): + span = self._open.pop(signal.key, None) + if span is not None: + await self._tracing.end_span( + trace_id=self.trace_id, + span=span, + output=signal.output, + ) + except Exception as exc: # best-effort: tracing never breaks delivery + logger.warning("[harness.tracer] span signal failed: %s", exc) +``` + +Note for the implementer: confirm `adk.tracing.end_span` accepts `output=` (seen in +`src/agentex/lib/adk/_modules/tracing.py`). If the kwarg differs, adjust the call and the +fake in the test together. + +- [ ] **Step 4: Run tests to verify they pass** + +Run: `pytest tests/lib/core/harness/test_tracer.py -v` +Expected: PASS (3 passed) + +- [ ] **Step 5: Commit** + +```bash +git add src/agentex/lib/core/harness/tracer.py tests/lib/core/harness/test_tracer.py +git commit -m "feat(harness): SpanTracer adapter from span signals to adk.tracing" +``` + +--- + +## Task 4: `yield_events` delivery adapter — PR 3 (part 1) + +**Files:** +- Create: `src/agentex/lib/core/harness/yield_delivery.py` +- Test: `tests/lib/core/harness/test_yield_delivery.py` + +`yield_events` passes the canonical stream through unchanged (for sync HTTP ACP agents) while +feeding the `SpanDeriver` + `SpanTracer` as a side effect. Streaming fidelity is untouched. + +- [ ] **Step 1: Write the failing test** + +Create `tests/lib/core/harness/test_yield_delivery.py`: + +```python +import pytest + +from agentex.lib.core.harness.yield_delivery import yield_events +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.types.task_message_update import ( + StreamTaskMessageStart, + StreamTaskMessageDone, + StreamTaskMessageFull, +) +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent + + +class _RecordTracing: + def __init__(self): + self.started, self.ended = [], [] + + async def start_span(self, *, trace_id, name, input=None, parent_id=None, data=None, task_id=None): + self.started.append(name) + return object() + + async def end_span(self, *, trace_id, span, output=None, data=None): + self.ended.append(output) + + +async def _gen(events): + for e in events: + yield e + + +@pytest.mark.asyncio +async def test_yield_passes_events_through_and_traces(): + fake = _RecordTracing() + tracer = SpanTracer(trace_id="t", parent_span_id="p", tracing=fake) + events = [ + StreamTaskMessageStart(type="start", index=0, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull(type="full", index=1, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="c", name="Bash", content="ok")), + ] + out = [e async for e in yield_events(_gen(events), tracer=tracer)] + assert out == events # passthrough unchanged + assert fake.started == ["Bash"] # span derived + opened + assert fake.ended == ["ok"] # span closed with response +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `pytest tests/lib/core/harness/test_yield_delivery.py -v` +Expected: FAIL with `ModuleNotFoundError: agentex.lib.core.harness.yield_delivery` + +- [ ] **Step 3: Implement `yield_events`** + +Create `src/agentex/lib/core/harness/yield_delivery.py`: + +```python +"""Yield delivery: pass the canonical stream through, tracing as a side effect.""" + +from __future__ import annotations + +from typing import AsyncIterator + +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import StreamTaskMessage + + +async def yield_events( + events: AsyncIterator[StreamTaskMessage], + tracer: SpanTracer | None = None, +) -> AsyncIterator[StreamTaskMessage]: + """Forward each event to the caller; derive + trace spans as a side effect. + + For sync HTTP ACP agents that yield events back over the response. When + `tracer` is None, this is a pure passthrough. + """ + deriver = SpanDeriver() if tracer is not None else None + try: + async for event in events: + if deriver is not None and tracer is not None: + for signal in deriver.observe(event): + await tracer.handle(signal) + yield event + finally: + if deriver is not None and tracer is not None: + for signal in deriver.flush(): + await tracer.handle(signal) +``` + +- [ ] **Step 4: Run test to verify it passes** + +Run: `pytest tests/lib/core/harness/test_yield_delivery.py -v` +Expected: PASS (1 passed) + +- [ ] **Step 5: Commit** + +```bash +git add src/agentex/lib/core/harness/yield_delivery.py tests/lib/core/harness/test_yield_delivery.py +git commit -m "feat(harness): yield_events delivery adapter (passthrough + tracing)" +``` + +--- + +## Task 5: `auto_send` delivery adapter — PR 2 + +**Files:** +- Create: `src/agentex/lib/core/harness/auto_send.py` +- Test: `tests/lib/core/harness/test_auto_send.py` + +`auto_send` consumes the canonical stream and drives `adk.streaming` context managers: it opens +a text context for `TextContent`, a reasoning context for `ReasoningContent`, switches cleanly +between them, and posts tool request/response as full messages. It feeds the same +`SpanDeriver`/`SpanTracer` and returns `TurnResult`. This generalizes the golden agent's +`AgentexStreamAdapter` (`teams/sgp/agents/golden_agent/project/harness/adapter.py`) to consume +`StreamTaskMessage*` instead of `HarnessEvent`. + +Reference while implementing: `src/agentex/lib/adk/_modules/_langgraph_async.py` +(`stream_langgraph_events`) shows the exact `adk.streaming` open/stream/close pattern to reuse; +`adapter.py` lines 87–130 show the text↔reasoning↔tool switching logic to mirror. + +- [ ] **Step 1: Write the failing test (fake streaming records context lifecycle)** + +Create `tests/lib/core/harness/test_auto_send.py`: + +```python +import pytest + +from agentex.lib.core.harness.auto_send import auto_send +from agentex.types.task_message_update import ( + StreamTaskMessageStart, + StreamTaskMessageDelta, + StreamTaskMessageDone, +) +from agentex.types.text_content import TextContent +from agentex.types.text_delta import TextDelta + + +class _FakeCtx: + def __init__(self, sink): + self.sink = sink + + async def __aenter__(self): + self.sink.append(("open",)) + return self + + async def __aexit__(self, *a): + self.sink.append(("close",)) + return False + + async def stream_update(self, update): + self.sink.append(("update", update)) + return update + + +class _FakeStreaming: + def __init__(self): + self.sink = [] + + def streaming_task_message_context(self, task_id, initial_content, streaming_mode="coalesced", created_at=None): + self.sink.append(("ctx", getattr(initial_content, "type", None))) + return _FakeCtx(self.sink) + + +async def _gen(events): + for e in events: + yield e + + +@pytest.mark.asyncio +async def test_auto_send_streams_text_and_returns_final_text(): + streaming = _FakeStreaming() + events = [ + StreamTaskMessageStart(type="start", index=0, + content=TextContent(type="text", author="agent", content="")), + StreamTaskMessageDelta(type="delta", index=0, delta=TextDelta(type="text", text_delta="Hel")), + StreamTaskMessageDelta(type="delta", index=0, delta=TextDelta(type="text", text_delta="lo")), + StreamTaskMessageDone(type="done", index=0), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + assert result.final_text == "Hello" + kinds = [s[0] for s in streaming.sink] + assert kinds[0] == "ctx" and "open" in kinds and "close" in kinds +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `pytest tests/lib/core/harness/test_auto_send.py -v` +Expected: FAIL with `ModuleNotFoundError: agentex.lib.core.harness.auto_send` + +- [ ] **Step 3: Implement `auto_send`** + +Create `src/agentex/lib/core/harness/auto_send.py`. The implementer mirrors the text↔reasoning +switching from `adapter.py` and the `adk.streaming` usage from `_langgraph_async.py`: + +```python +"""Auto-send delivery: canonical stream -> adk.streaming side effects + tracing.""" + +from __future__ import annotations + +from typing import Any, AsyncIterator + +from agentex.types.task_message_update import ( + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, +) +from agentex.types.text_content import TextContent + +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import StreamTaskMessage, TurnResult, TurnUsage + + +async def auto_send( + events: AsyncIterator[StreamTaskMessage], + task_id: str, + tracer: SpanTracer | None = None, + streaming: Any = None, + usage: TurnUsage | None = None, +) -> TurnResult: + """Push the canonical stream to the task stream via adk.streaming. + + Opens a streaming context per text/reasoning message, streams deltas, and + closes on Done; posts tool request/response as full messages; derives and + traces spans from the same stream. Returns the accumulated final text + + usage. For async + temporal agents (call from inside an activity). + """ + if streaming is None: + from agentex.lib import adk + + streaming = adk.streaming + + deriver = SpanDeriver() if tracer is not None else None + final_text_parts: list[str] = [] + current_ctx: Any = None + current_kind: str | None = None # "text" | "reasoning" + + async def _close_current() -> None: + nonlocal current_ctx, current_kind + if current_ctx is not None: + await current_ctx.__aexit__(None, None, None) + current_ctx = None + current_kind = None + + try: + async for event in events: + if deriver is not None and tracer is not None: + for signal in deriver.observe(event): + await tracer.handle(signal) + + if isinstance(event, StreamTaskMessageStart): + ctype = getattr(event.content, "type", None) + if ctype in ("text", "reasoning"): + await _close_current() + current_ctx = streaming.streaming_task_message_context( + task_id=task_id, initial_content=event.content, + ) + await current_ctx.__aenter__() + current_kind = ctype + elif isinstance(event, StreamTaskMessageDelta): + if current_ctx is not None and event.delta is not None: + await current_ctx.stream_update(event) + if getattr(event.delta, "type", None) == "text" and event.delta.text_delta: + final_text_parts.append(event.delta.text_delta) + elif isinstance(event, StreamTaskMessageDone): + await _close_current() + elif isinstance(event, StreamTaskMessageFull): + # Tool request/response (and any non-streamed full message): post as a + # standalone full message, not tied to the current text/reasoning ctx. + await _close_current() + ctx = streaming.streaming_task_message_context( + task_id=task_id, initial_content=event.content, + ) + await ctx.__aenter__() + await ctx.__aexit__(None, None, None) + finally: + await _close_current() + if deriver is not None and tracer is not None: + for signal in deriver.flush(): + await tracer.handle(signal) + + return TurnResult(final_text="".join(final_text_parts), usage=usage or TurnUsage()) +``` + +Note for the implementer: validate the exact `streaming_task_message_context` usage against +`_langgraph_async.py` (whether to call `stream_update` with the whole `StreamTaskMessageDelta` +or the inner delta). Adjust the call and the fake together; the test asserts behavior, not the +internal kwarg shape. + +- [ ] **Step 4: Run test to verify it passes** + +Run: `pytest tests/lib/core/harness/test_auto_send.py -v` +Expected: PASS (1 passed) + +- [ ] **Step 5: Commit** + +```bash +git add src/agentex/lib/core/harness/auto_send.py tests/lib/core/harness/test_auto_send.py +git commit -m "feat(harness): auto_send delivery adapter (canonical stream -> adk.streaming + tracing)" +``` + +--- + +## Task 6: `UnifiedEmitter` facade — PR 3 (part 2) + +**Files:** +- Create: `src/agentex/lib/core/harness/emitter.py` +- Modify: `src/agentex/lib/core/harness/__init__.py` (re-export public surface) +- Test: `tests/lib/core/harness/test_emitter.py` + +`UnifiedEmitter` is the single thing an agent author touches. It owns the trace context, builds +the `SpanTracer` (default-on when a trace context exists, overridable), and exposes both +delivery modes over a `HarnessTurn`. It attaches the turn's `TurnUsage` to delivery. + +- [ ] **Step 1: Write the failing test** + +Create `tests/lib/core/harness/test_emitter.py`: + +```python +import pytest + +from agentex.lib.core.harness.emitter import UnifiedEmitter +from agentex.lib.core.harness.types import TurnUsage +from agentex.types.task_message_update import StreamTaskMessageStart, StreamTaskMessageDone +from agentex.types.text_content import TextContent + + +class _Turn: + def __init__(self, events_list, usage): + self._events_list = events_list + self._usage = usage + + @property + async def events(self): + for e in self._events_list: + yield e + + def usage(self): + return self._usage + + +@pytest.mark.asyncio +async def test_emitter_yield_mode_passes_through(): + events = [ + StreamTaskMessageStart(type="start", index=0, + content=TextContent(type="text", author="agent", content="hi")), + StreamTaskMessageDone(type="done", index=0), + ] + turn = _Turn(events, TurnUsage(model="m")) + emitter = UnifiedEmitter(task_id="t", trace_id=None, parent_span_id=None) + out = [e async for e in emitter.yield_turn(turn)] + assert out == events + + +@pytest.mark.asyncio +async def test_emitter_tracing_default_on_when_trace_id_present(): + emitter = UnifiedEmitter(task_id="t", trace_id="trace1", parent_span_id="p") + assert emitter.tracer is not None + + +@pytest.mark.asyncio +async def test_emitter_tracing_overridable_off(): + emitter = UnifiedEmitter(task_id="t", trace_id="trace1", parent_span_id="p", tracer=False) + assert emitter.tracer is None +``` + +- [ ] **Step 2: Run test to verify it fails** + +Run: `pytest tests/lib/core/harness/test_emitter.py -v` +Expected: FAIL with `ModuleNotFoundError: agentex.lib.core.harness.emitter` + +- [ ] **Step 3: Implement `UnifiedEmitter`** + +Create `src/agentex/lib/core/harness/emitter.py`: + +```python +"""UnifiedEmitter: the single facade agent authors use for either delivery mode.""" + +from __future__ import annotations + +from typing import Any, AsyncIterator + +from agentex.lib.core.harness.auto_send import auto_send +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import HarnessTurn, StreamTaskMessage, TurnResult +from agentex.lib.core.harness.yield_delivery import yield_events + + +class UnifiedEmitter: + """Ties trace context + chosen delivery together. + + Tracing is default-on whenever `trace_id` is truthy; pass `tracer=False` to + disable, or a custom `SpanTracer` to override. + """ + + def __init__( + self, + task_id: str, + trace_id: str | None, + parent_span_id: str | None, + tracer: SpanTracer | bool | None = None, + ): + self.task_id = task_id + self.trace_id = trace_id + self.parent_span_id = parent_span_id + if tracer is False: + self.tracer: SpanTracer | None = None + elif isinstance(tracer, SpanTracer): + self.tracer = tracer + elif trace_id: + self.tracer = SpanTracer(trace_id=trace_id, parent_span_id=parent_span_id, task_id=task_id) + else: + self.tracer = None + + async def yield_turn(self, turn: HarnessTurn) -> AsyncIterator[StreamTaskMessage]: + """Sync HTTP ACP delivery: forward events, trace as side effect.""" + async for event in yield_events(turn.events, tracer=self.tracer): + yield event + + async def auto_send_turn(self, turn: HarnessTurn) -> TurnResult: + """Async/temporal delivery: push to the task stream, return TurnResult.""" + return await auto_send( + turn.events, + task_id=self.task_id, + tracer=self.tracer, + usage=turn.usage(), + ) +``` + +- [ ] **Step 4: Re-export the public surface** + +Append to `src/agentex/lib/core/harness/__init__.py`: + +```python +from agentex.lib.core.harness.emitter import UnifiedEmitter +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import ( + CloseSpan, + HarnessTurn, + OpenSpan, + SpanSignal, + StreamTaskMessage, + TurnResult, + TurnUsage, +) + +__all__ = [ + "UnifiedEmitter", + "SpanTracer", + "OpenSpan", + "CloseSpan", + "SpanSignal", + "StreamTaskMessage", + "TurnUsage", + "TurnResult", + "HarnessTurn", +] +``` + +- [ ] **Step 5: Run tests to verify they pass** + +Run: `pytest tests/lib/core/harness/ -v` +Expected: PASS (all harness tests green) + +- [ ] **Step 6: Commit** + +```bash +git add src/agentex/lib/core/harness/emitter.py src/agentex/lib/core/harness/__init__.py tests/lib/core/harness/test_emitter.py +git commit -m "feat(harness): UnifiedEmitter facade tying delivery + tracing + usage" +``` + +--- + +## Task 7: Conformance test scaffold + empty CI integration job — PR 3 (part 3) + +**Files:** +- Create: `tests/lib/core/harness/conformance/__init__.py` +- Create: `tests/lib/core/harness/conformance/runner.py` +- Create: `tests/lib/core/harness/conformance/test_conformance.py` +- Create: `.github/workflows/harness-integration.yml` + +The conformance runner is the shared parametrized engine each harness tap will register fixtures +with (in later plans). It asserts yield-vs-auto-send equivalence on the span signals derived +from a fixture's canonical-event sequence. + +- [ ] **Step 1: Write the conformance runner + a self-test fixture** + +Create `tests/lib/core/harness/conformance/__init__.py` (empty), then +`tests/lib/core/harness/conformance/runner.py`: + +```python +"""Shared conformance engine: every harness tap registers fixtures here. + +A fixture is (name, list[StreamTaskMessage]). The runner asserts that span +derivation over the events is identical regardless of delivery channel, which is +the cross-channel guarantee from the spec. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Callable + +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.types import SpanSignal, StreamTaskMessage + + +@dataclass +class Fixture: + name: str + events: list[StreamTaskMessage] + + +_REGISTRY: list[Fixture] = [] + + +def register(fixture: Fixture) -> None: + _REGISTRY.append(fixture) + + +def all_fixtures() -> list[Fixture]: + return list(_REGISTRY) + + +def derive_all(events: list[StreamTaskMessage]) -> list[SpanSignal]: + d = SpanDeriver() + out: list[SpanSignal] = [] + for e in events: + out.extend(d.observe(e)) + out.extend(d.flush()) + return out +``` + +- [ ] **Step 2: Write the conformance test (self-test on a built-in fixture)** + +Create `tests/lib/core/harness/conformance/test_conformance.py`: + +```python +import pytest + +from tests.lib.core.harness.conformance.runner import Fixture, derive_all, register, all_fixtures +from agentex.types.task_message_update import ( + StreamTaskMessageStart, StreamTaskMessageDone, StreamTaskMessageFull, +) +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent + +register(Fixture( + name="builtin-single-tool", + events=[ + StreamTaskMessageStart(type="start", index=0, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull(type="full", index=1, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="c", name="Bash", content="ok")), + ], +)) + + +@pytest.mark.parametrize("fixture", all_fixtures(), ids=lambda f: f.name) +def test_span_derivation_is_deterministic(fixture): + # Deriving twice over the same events yields identical signals (the property + # that makes yield vs auto-send equivalent, since both observe the same stream). + assert derive_all(fixture.events) == derive_all(fixture.events) +``` + +- [ ] **Step 3: Run the conformance test** + +Run: `pytest tests/lib/core/harness/conformance/ -v` +Expected: PASS (1 passed) + +- [ ] **Step 4: Add the empty CI integration job** + +Create `.github/workflows/harness-integration.yml` (mirrors the structure of the existing +`agentex-tutorials-test.yml`; the matrix is populated in later plans): + +```yaml +name: Harness Integration + +on: + pull_request: + paths: + - "src/agentex/lib/core/harness/**" + - "src/agentex/lib/adk/_modules/**" + - ".github/workflows/harness-integration.yml" + +jobs: + conformance: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: astral-sh/setup-uv@v5 + - name: Install + run: uv sync + - name: Conformance suite + run: uv run pytest tests/lib/core/harness/ -v + + # Live integration matrix (harness x {sync, async, temporal}) is added per-harness + # in the migration plans. Placeholder job keeps the workflow valid until then. + live-matrix: + runs-on: ubuntu-latest + if: false # enabled once the first harness's test agents land + steps: + - run: echo "populated by migration PRs" +``` + +- [ ] **Step 5: Commit** + +```bash +git add tests/lib/core/harness/conformance .github/workflows/harness-integration.yml +git commit -m "test(harness): conformance scaffold + CI integration job skeleton" +``` + +--- + +## Task 8: Run the full suite + type check + +- [ ] **Step 1: Run the whole harness test tree** + +Run: `pytest tests/lib/core/harness/ -v` +Expected: PASS (all tasks' tests green) + +- [ ] **Step 2: Type check the new package** + +Run: `uv run mypy src/agentex/lib/core/harness/` (or the repo's configured type checker) +Expected: no errors. Fix any signature mismatches inline. + +- [ ] **Step 3: Final commit if the type check required fixes** + +```bash +git add -A && git commit -m "chore(harness): type-check fixes for foundation package" +``` + +--- + +## Subsequent plans (to be written after this lands) + +Each gets its own plan via the writing-plans skill, expanded with that harness's exact +converter code: + +- **PR 4 — Migrate pydantic-ai:** wrap `convert_pydantic_ai_to_agentex_events` as a + `HarnessTurn` (add `usage()` normalizing `result.usage()`), reimplement `_pydantic_ai_async` + on `auto_send`, retire `_pydantic_ai_tracing` in favor of `SpanTracer`, keep the public + `convert_*` signature. Add 3 test agents (sync/async/temporal) + register conformance + fixtures + enable the live-matrix row. +- **PR 5 — Migrate langgraph:** same shape; reimplement `stream_langgraph_events` on + `auto_send`; normalize `usage_metadata` into `TurnUsage`. +- **PR 6 — Migrate openai-agents:** same shape; reimplement `run_agent_streamed_auto_send` on + `auto_send`; normalize `response.usage`. +- **PR 7 — claude-code parser tap:** `convert_claude_code_to_agentex_events` (port the golden + agent's `_StreamJsonProcessor` to yield `StreamTaskMessage*`) + recorded stream-json + fixtures + feasible test agent(s). +- **PR 8 — codex parser tap:** same shape for `_CodexEventProcessor`. +- **PR 9 — Cleanup:** delete now-dead internal duplication, deprecate `_*_tracing` shims, docs. + +The `is_error` tool-error work is deferred and tracked in Linear as AGX1-371. From de564833e8364492bc7310880d9557643fa98883 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:00:40 -0400 Subject: [PATCH 05/26] feat(harness): foundation types for unified harness surface Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/__init__.py | 6 ++ src/agentex/lib/core/harness/types.py | 91 ++++++++++++++++++++++++ tests/lib/core/harness/__init__.py | 0 tests/lib/core/harness/test_types.py | 27 +++++++ 4 files changed, 124 insertions(+) create mode 100644 src/agentex/lib/core/harness/__init__.py create mode 100644 src/agentex/lib/core/harness/types.py create mode 100644 tests/lib/core/harness/__init__.py create mode 100644 tests/lib/core/harness/test_types.py diff --git a/src/agentex/lib/core/harness/__init__.py b/src/agentex/lib/core/harness/__init__.py new file mode 100644 index 000000000..15d116148 --- /dev/null +++ b/src/agentex/lib/core/harness/__init__.py @@ -0,0 +1,6 @@ +"""Shared, harness-independent machinery for the unified harness surface. + +The Agentex StreamTaskMessage* stream is the single source of truth; this +package derives spans from it and delivers it (yield or auto-send), so every +harness tap gets streaming + tracing + turn usage uniformly. +""" diff --git a/src/agentex/lib/core/harness/types.py b/src/agentex/lib/core/harness/types.py new file mode 100644 index 000000000..f31b2c67f --- /dev/null +++ b/src/agentex/lib/core/harness/types.py @@ -0,0 +1,91 @@ +"""Types for the unified harness surface.""" + +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import Any, AsyncIterator, Literal, Protocol, Union, runtime_checkable + +from agentex.types.task_message_update import ( + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, +) +from pydantic import BaseModel, ConfigDict + +# The canonical stream element. Taps yield these; delivery adapters consume them. +StreamTaskMessage = Union[ + StreamTaskMessageStart, + StreamTaskMessageDelta, + StreamTaskMessageFull, + StreamTaskMessageDone, +] + +SpanKind = Literal["tool", "reasoning", "subagent"] + + +@dataclass +class OpenSpan: + """Signal to open a child span. `key` pairs an open with its close.""" + + key: str + kind: SpanKind + name: str + input: dict[str, Any] = field(default_factory=dict) + + +@dataclass +class CloseSpan: + """Signal to close the span previously opened with the same `key`.""" + + key: str + output: Any = None + is_complete: bool = True # False when closed by flush() without a result + + +SpanSignal = Union[OpenSpan, CloseSpan] + + +class TurnUsage(BaseModel): + """Harness-independent turn usage/cost, attached to the turn span. + + Token field names align with agentex.lib.core.observability.llm_metrics. + """ + + model_config = ConfigDict(from_attributes=True, populate_by_name=True) + + model: str | None = None + input_tokens: int | None = None + output_tokens: int | None = None + cached_input_tokens: int | None = None + reasoning_tokens: int | None = None + total_tokens: int | None = None + cost_usd: float | None = None + duration_ms: int | None = None + num_llm_calls: int = 0 + num_tool_calls: int = 0 + num_reasoning_blocks: int = 0 + + +class TurnResult(BaseModel): + """Returned to the caller after a turn is delivered.""" + + model_config = ConfigDict(from_attributes=True, populate_by_name=True) + + final_text: str = "" + usage: TurnUsage = TurnUsage() + + +@runtime_checkable +class HarnessTurn(Protocol): + """A single harness turn: a canonical stream plus its normalized usage. + + Python async generators cannot cleanly return a value to their consumer, so + a tap exposes usage via `usage()` (valid only after `events` is exhausted) + rather than via StopAsyncIteration. + """ + + @property + def events(self) -> AsyncIterator[StreamTaskMessage]: ... + + def usage(self) -> TurnUsage: ... diff --git a/tests/lib/core/harness/__init__.py b/tests/lib/core/harness/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/tests/lib/core/harness/test_types.py b/tests/lib/core/harness/test_types.py new file mode 100644 index 000000000..b025d803b --- /dev/null +++ b/tests/lib/core/harness/test_types.py @@ -0,0 +1,27 @@ +from agentex.lib.core.harness.types import ( + OpenSpan, + CloseSpan, + TurnUsage, + TurnResult, +) + + +def test_open_close_span_construct(): + o = OpenSpan(key="call_1", kind="tool", name="Bash", input={"cmd": "ls"}) + c = CloseSpan(key="call_1", output="files", is_complete=True) + assert o.key == c.key == "call_1" + assert o.kind == "tool" + assert c.is_complete is True + + +def test_turn_usage_defaults_are_none(): + u = TurnUsage(model="claude-opus-4-6") + assert u.model == "claude-opus-4-6" + assert u.input_tokens is None + assert u.num_tool_calls == 0 + + +def test_turn_result_wraps_usage(): + r = TurnResult(final_text="hi", usage=TurnUsage(model="m")) + assert r.final_text == "hi" + assert r.usage.model == "m" From a13725cca825b8a0f21b7a1b25c25bf223447b27 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:05:15 -0400 Subject: [PATCH 06/26] test(harness): cover CloseSpan defaults and HarnessTurn runtime check Co-Authored-By: Claude Sonnet 4.6 --- tests/lib/core/harness/test_types.py | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/tests/lib/core/harness/test_types.py b/tests/lib/core/harness/test_types.py index b025d803b..91857993a 100644 --- a/tests/lib/core/harness/test_types.py +++ b/tests/lib/core/harness/test_types.py @@ -1,6 +1,10 @@ +from typing import AsyncIterator + from agentex.lib.core.harness.types import ( OpenSpan, CloseSpan, + HarnessTurn, + StreamTaskMessage, TurnUsage, TurnResult, ) @@ -25,3 +29,25 @@ def test_turn_result_wraps_usage(): r = TurnResult(final_text="hi", usage=TurnUsage(model="m")) assert r.final_text == "hi" assert r.usage.model == "m" + + +def test_close_span_defaults(): + c = CloseSpan(key="x") + assert c.output is None + assert c.is_complete is True + + +def test_harness_turn_runtime_check(): + class _Turn: + @property + def events(self) -> AsyncIterator[StreamTaskMessage]: + async def _gen() -> AsyncIterator[StreamTaskMessage]: + if False: + yield # pragma: no cover + + return _gen() + + def usage(self) -> TurnUsage: + return TurnUsage(model="m") + + assert isinstance(_Turn(), HarnessTurn) is True From 13868ee1968632f47eb2679b234c908239896f84 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:07:47 -0400 Subject: [PATCH 07/26] feat(harness): pure SpanDeriver reducing the canonical stream to span signals Co-Authored-By: Claude Sonnet 4.6 --- .../lib/core/harness/span_derivation.py | 115 +++++++++++++++++ .../lib/core/harness/test_span_derivation.py | 120 ++++++++++++++++++ 2 files changed, 235 insertions(+) create mode 100644 src/agentex/lib/core/harness/span_derivation.py create mode 100644 tests/lib/core/harness/test_span_derivation.py diff --git a/src/agentex/lib/core/harness/span_derivation.py b/src/agentex/lib/core/harness/span_derivation.py new file mode 100644 index 000000000..deb5d6d68 --- /dev/null +++ b/src/agentex/lib/core/harness/span_derivation.py @@ -0,0 +1,115 @@ +"""Pure reducer: canonical StreamTaskMessage* stream -> span open/close signals. + +Has no dependency on adk; unit-testable in isolation. Delivery adapters feed it +every event and act on the returned signals. +""" + +from __future__ import annotations + +import json +from dataclasses import dataclass +from typing import Any + +from agentex.types.task_message_update import ( + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, +) + +from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal, StreamTaskMessage + + +@dataclass +class _ToolReqMeta: + tool_call_id: str + name: str + arguments: dict[str, Any] + args_buf: str = "" # accumulated streamed argument fragments + + +class SpanDeriver: + """Stateful reducer over the canonical stream. + + Tool span: open on Done of a ToolRequestContent index; close on matching + ToolResponseContent by tool_call_id. Reasoning span: open on + Start(ReasoningContent); close on that index's Done. + """ + + def __init__(self) -> None: + self._tool_by_index: dict[int, _ToolReqMeta] = {} + self._reasoning_index_open: set[int] = set() + self._open_tool_ids: set[str] = set() + + def observe(self, event: StreamTaskMessage) -> list[SpanSignal]: + if isinstance(event, StreamTaskMessageStart): + return self._on_start(event) + if isinstance(event, StreamTaskMessageDelta): + return self._on_delta(event) + if isinstance(event, StreamTaskMessageFull): + return self._on_full(event) + if isinstance(event, StreamTaskMessageDone): + return self._on_done(event) + return [] + + def flush(self) -> list[SpanSignal]: + """Close anything still open at end of stream, marked incomplete.""" + signals: list[SpanSignal] = [] + for tcid in list(self._open_tool_ids): + signals.append(CloseSpan(key=tcid, output=None, is_complete=False)) + self._open_tool_ids.clear() + for idx in sorted(self._reasoning_index_open): + signals.append(CloseSpan(key=f"reasoning:{idx}", output=None, is_complete=False)) + self._reasoning_index_open.clear() + return signals + + def _on_start(self, event: StreamTaskMessageStart) -> list[SpanSignal]: + content = event.content + idx = event.index if event.index is not None else -1 + ctype = getattr(content, "type", None) + if ctype == "tool_request": + self._tool_by_index[idx] = _ToolReqMeta( + tool_call_id=content.tool_call_id, + name=content.name, + arguments=dict(content.arguments or {}), + ) + return [] + if ctype == "reasoning": + self._reasoning_index_open.add(idx) + return [OpenSpan(key=f"reasoning:{idx}", kind="reasoning", name="reasoning", input={})] + return [] + + def _on_delta(self, event: StreamTaskMessageDelta) -> list[SpanSignal]: + idx = event.index if event.index is not None else -1 + delta = event.delta + if delta is not None and getattr(delta, "type", None) == "tool_request": + meta = self._tool_by_index.get(idx) + if meta is not None and delta.arguments_delta: + meta.args_buf += delta.arguments_delta + return [] + + def _on_full(self, event: StreamTaskMessageFull) -> list[SpanSignal]: + content = event.content + if getattr(content, "type", None) == "tool_response": + tcid = content.tool_call_id + if tcid in self._open_tool_ids: + self._open_tool_ids.discard(tcid) + return [CloseSpan(key=tcid, output=content.content, is_complete=True)] + return [] + + def _on_done(self, event: StreamTaskMessageDone) -> list[SpanSignal]: + idx = event.index if event.index is not None else -1 + meta = self._tool_by_index.pop(idx, None) + if meta is not None: + args = meta.arguments + if meta.args_buf: + try: + args = json.loads(meta.args_buf) + except json.JSONDecodeError: + args = {"_raw": meta.args_buf} + self._open_tool_ids.add(meta.tool_call_id) + return [OpenSpan(key=meta.tool_call_id, kind="tool", name=meta.name, input=args)] + if idx in self._reasoning_index_open: + self._reasoning_index_open.discard(idx) + return [CloseSpan(key=f"reasoning:{idx}", output=None, is_complete=True)] + return [] diff --git a/tests/lib/core/harness/test_span_derivation.py b/tests/lib/core/harness/test_span_derivation.py new file mode 100644 index 000000000..0b1a4bcbe --- /dev/null +++ b/tests/lib/core/harness/test_span_derivation.py @@ -0,0 +1,120 @@ +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.types import OpenSpan, CloseSpan +from agentex.types.task_message_update import ( + StreamTaskMessageStart, + StreamTaskMessageDelta, + StreamTaskMessageFull, + StreamTaskMessageDone, +) +from agentex.types.text_content import TextContent +from agentex.types.reasoning_content import ReasoningContent +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent +from agentex.types.tool_request_delta import ToolRequestDelta + + +def _signals(deriver, events): + out = [] + for e in events: + out.extend(deriver.observe(e)) + out.extend(deriver.flush()) + return out + + +def _tool_req(idx, tcid, name, args): + return StreamTaskMessageStart( + type="start", index=idx, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id=tcid, name=name, arguments=args), + ) + + +def test_text_only_yields_no_spans(): + d = SpanDeriver() + events = [ + StreamTaskMessageStart(type="start", index=0, + content=TextContent(type="text", author="agent", content="")), + StreamTaskMessageDelta(type="delta", index=0, + delta=None), + StreamTaskMessageDone(type="done", index=0), + ] + assert _signals(d, events) == [] + + +def test_single_tool_opens_on_done_closes_on_response(): + d = SpanDeriver() + events = [ + _tool_req(0, "call_1", "Bash", {"cmd": "ls"}), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull(type="full", index=1, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="call_1", name="Bash", content="files")), + ] + sigs = _signals(d, events) + assert sigs == [ + OpenSpan(key="call_1", kind="tool", name="Bash", input={"cmd": "ls"}), + CloseSpan(key="call_1", output="files", is_complete=True), + ] + + +def test_reasoning_opens_on_start_closes_on_done(): + d = SpanDeriver() + events = [ + StreamTaskMessageStart(type="start", index=0, + content=ReasoningContent(type="reasoning", author="agent", summary=[], content=[])), + StreamTaskMessageDone(type="done", index=0), + ] + sigs = _signals(d, events) + assert sigs[0] == OpenSpan(key="reasoning:0", kind="reasoning", name="reasoning", input={}) + assert sigs[1] == CloseSpan(key="reasoning:0", output=None, is_complete=True) + + +def test_parallel_tools_pair_by_tool_call_id(): + d = SpanDeriver() + events = [ + _tool_req(0, "a", "T1", {}), + _tool_req(1, "b", "T2", {}), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageDone(type="done", index=1), + StreamTaskMessageFull(type="full", index=2, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="b", name="T2", content="rb")), + StreamTaskMessageFull(type="full", index=3, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="a", name="T1", content="ra")), + ] + sigs = _signals(d, events) + opens = [s for s in sigs if isinstance(s, OpenSpan)] + closes = [s for s in sigs if isinstance(s, CloseSpan)] + assert {o.key for o in opens} == {"a", "b"} + assert [c.key for c in closes] == ["b", "a"] + assert all(c.is_complete for c in closes) + + +def test_streamed_args_accumulate_into_open_input(): + d = SpanDeriver() + events = [ + StreamTaskMessageStart(type="start", index=0, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageDelta(type="delta", index=0, + delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", + arguments_delta='{"cmd":')), + StreamTaskMessageDelta(type="delta", index=0, + delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", + arguments_delta='"ls"}')), + StreamTaskMessageDone(type="done", index=0), + ] + sigs = _signals(d, events) + assert sigs[0] == OpenSpan(key="c", kind="tool", name="Bash", input={"cmd": "ls"}) + + +def test_unclosed_tool_closed_incomplete_on_flush(): + d = SpanDeriver() + events = [ + _tool_req(0, "x", "Bash", {}), + StreamTaskMessageDone(type="done", index=0), + ] + sigs = _signals(d, events) + assert sigs[0] == OpenSpan(key="x", kind="tool", name="Bash", input={}) + assert sigs[1] == CloseSpan(key="x", output=None, is_complete=False) From 0ecc03f4b4f8e686c73c0d7f2acc2c29c91a6f4c Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:13:13 -0400 Subject: [PATCH 08/26] refactor(harness): deterministic flush order + defensive index/orphan handling in SpanDeriver Co-Authored-By: Claude Sonnet 4.6 --- .../lib/core/harness/span_derivation.py | 33 ++++++++++++++----- .../lib/core/harness/test_span_derivation.py | 21 ++++++++++++ 2 files changed, 46 insertions(+), 8 deletions(-) diff --git a/src/agentex/lib/core/harness/span_derivation.py b/src/agentex/lib/core/harness/span_derivation.py index deb5d6d68..15e1f593f 100644 --- a/src/agentex/lib/core/harness/span_derivation.py +++ b/src/agentex/lib/core/harness/span_derivation.py @@ -8,7 +8,6 @@ import json from dataclasses import dataclass -from typing import Any from agentex.types.task_message_update import ( StreamTaskMessageDelta, @@ -24,7 +23,7 @@ class _ToolReqMeta: tool_call_id: str name: str - arguments: dict[str, Any] + arguments: dict[str, object] args_buf: str = "" # accumulated streamed argument fragments @@ -34,12 +33,24 @@ class SpanDeriver: Tool span: open on Done of a ToolRequestContent index; close on matching ToolResponseContent by tool_call_id. Reasoning span: open on Start(ReasoningContent); close on that index's Done. + + Deliberate contracts: + - A `Full(ToolResponseContent)` whose tool_call_id was never opened is + ignored (no CloseSpan emitted). + - A `Done` for an index that was never a tool_request/reasoning Start is + ignored (no signal emitted). + - Events with `index is None` are skipped entirely; without a stable index + they cannot be reliably paired, and aliasing them to a sentinel would + let unrelated None-indexed events cross-match. + - `flush()` closes anything still open as incomplete; unclosed tool spans + are emitted in the order they were opened. """ def __init__(self) -> None: self._tool_by_index: dict[int, _ToolReqMeta] = {} self._reasoning_index_open: set[int] = set() - self._open_tool_ids: set[str] = set() + # insertion-ordered set of open tool_call_ids (dict keys preserve order) + self._open_tool_ids: dict[str, None] = {} def observe(self, event: StreamTaskMessage) -> list[SpanSignal]: if isinstance(event, StreamTaskMessageStart): @@ -64,8 +75,10 @@ def flush(self) -> list[SpanSignal]: return signals def _on_start(self, event: StreamTaskMessageStart) -> list[SpanSignal]: + if event.index is None: + return [] + idx = event.index content = event.content - idx = event.index if event.index is not None else -1 ctype = getattr(content, "type", None) if ctype == "tool_request": self._tool_by_index[idx] = _ToolReqMeta( @@ -80,7 +93,9 @@ def _on_start(self, event: StreamTaskMessageStart) -> list[SpanSignal]: return [] def _on_delta(self, event: StreamTaskMessageDelta) -> list[SpanSignal]: - idx = event.index if event.index is not None else -1 + if event.index is None: + return [] + idx = event.index delta = event.delta if delta is not None and getattr(delta, "type", None) == "tool_request": meta = self._tool_by_index.get(idx) @@ -93,12 +108,14 @@ def _on_full(self, event: StreamTaskMessageFull) -> list[SpanSignal]: if getattr(content, "type", None) == "tool_response": tcid = content.tool_call_id if tcid in self._open_tool_ids: - self._open_tool_ids.discard(tcid) + self._open_tool_ids.pop(tcid, None) return [CloseSpan(key=tcid, output=content.content, is_complete=True)] return [] def _on_done(self, event: StreamTaskMessageDone) -> list[SpanSignal]: - idx = event.index if event.index is not None else -1 + if event.index is None: + return [] + idx = event.index meta = self._tool_by_index.pop(idx, None) if meta is not None: args = meta.arguments @@ -107,7 +124,7 @@ def _on_done(self, event: StreamTaskMessageDone) -> list[SpanSignal]: args = json.loads(meta.args_buf) except json.JSONDecodeError: args = {"_raw": meta.args_buf} - self._open_tool_ids.add(meta.tool_call_id) + self._open_tool_ids[meta.tool_call_id] = None return [OpenSpan(key=meta.tool_call_id, kind="tool", name=meta.name, input=args)] if idx in self._reasoning_index_open: self._reasoning_index_open.discard(idx) diff --git a/tests/lib/core/harness/test_span_derivation.py b/tests/lib/core/harness/test_span_derivation.py index 0b1a4bcbe..0630131d0 100644 --- a/tests/lib/core/harness/test_span_derivation.py +++ b/tests/lib/core/harness/test_span_derivation.py @@ -118,3 +118,24 @@ def test_unclosed_tool_closed_incomplete_on_flush(): sigs = _signals(d, events) assert sigs[0] == OpenSpan(key="x", kind="tool", name="Bash", input={}) assert sigs[1] == CloseSpan(key="x", output=None, is_complete=False) + + +def test_none_index_is_skipped(): + d = SpanDeriver() + events = [ + StreamTaskMessageStart(type="start", index=None, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="n", name="Bash", arguments={})), + StreamTaskMessageDone(type="done", index=None), + ] + assert _signals(d, events) == [] + + +def test_orphan_tool_response_ignored(): + d = SpanDeriver() + events = [ + StreamTaskMessageFull(type="full", index=0, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="z", name="Bash", content="r")), + ] + assert _signals(d, events) == [] From 8d708f41fccf0b5b6a046999ac67c2106666b5aa Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:16:02 -0400 Subject: [PATCH 09/26] feat(harness): SpanTracer adapter from span signals to adk.tracing Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/tracer.py | 68 ++++++++++++++++++++++++++ tests/lib/core/harness/test_tracer.py | 54 ++++++++++++++++++++ 2 files changed, 122 insertions(+) create mode 100644 src/agentex/lib/core/harness/tracer.py create mode 100644 tests/lib/core/harness/test_tracer.py diff --git a/src/agentex/lib/core/harness/tracer.py b/src/agentex/lib/core/harness/tracer.py new file mode 100644 index 000000000..55fab4029 --- /dev/null +++ b/src/agentex/lib/core/harness/tracer.py @@ -0,0 +1,68 @@ +"""Adapter from SpanSignals to adk.tracing spans (best-effort, overridable).""" + +from __future__ import annotations + +import logging +from typing import Any + +from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal + +logger = logging.getLogger(__name__) + + +class SpanTracer: + """Opens/closes adk.tracing child spans in response to span signals. + + `tracing` defaults to the real `adk.tracing` module; inject a fake in tests + or a custom tracer to override. No-op when `trace_id` is falsy. Never raises. + + The real TracingModule.end_span does NOT accept an output kwarg — output is + recorded by mutating span.output before calling end_span, matching the pattern + used throughout the codebase (see _langgraph_tracing.py on_tool_end etc.). + """ + + def __init__( + self, + trace_id: str | None, + parent_span_id: str | None, + tracing: Any = None, + task_id: str | None = None, + ): + self.trace_id = trace_id + self.parent_span_id = parent_span_id + self.task_id = task_id + if tracing is None: + from agentex.lib import adk + + tracing = adk.tracing + self._tracing = tracing + self._open: dict[str, Any] = {} # span key -> span object + + async def handle(self, signal: SpanSignal) -> None: + if not self.trace_id: + return + try: + if isinstance(signal, OpenSpan): + span = await self._tracing.start_span( + trace_id=self.trace_id, + name=signal.name, + input=signal.input, + parent_id=self.parent_span_id, + task_id=self.task_id, + ) + if span is not None: + self._open[signal.key] = span + elif isinstance(signal, CloseSpan): + span = self._open.pop(signal.key, None) + if span is not None: + # Output is recorded by mutating span.output before end_span. + # The real TracingModule.end_span signature is: + # end_span(trace_id, span, start_to_close_timeout, heartbeat_timeout, retry_policy) + # It does not accept an output= kwarg. + span.output = signal.output + await self._tracing.end_span( + trace_id=self.trace_id, + span=span, + ) + except Exception as exc: # best-effort: tracing never breaks delivery + logger.warning("[harness.tracer] span signal failed: %s", exc) diff --git a/tests/lib/core/harness/test_tracer.py b/tests/lib/core/harness/test_tracer.py new file mode 100644 index 000000000..105995bc8 --- /dev/null +++ b/tests/lib/core/harness/test_tracer.py @@ -0,0 +1,54 @@ +import pytest + +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import OpenSpan, CloseSpan + + +class _FakeSpan: + def __init__(self, name): + self.name = name + self.output = None + + +class _FakeTracing: + def __init__(self): + self.started = [] + self.ended = [] + + async def start_span(self, *, trace_id, name, input=None, parent_id=None, data=None, task_id=None): + self.started.append((name, parent_id, input)) + return _FakeSpan(name) + + async def end_span(self, *, trace_id, span): + self.ended.append((span.name, span.output)) + + +@pytest.mark.asyncio +async def test_open_then_close_starts_and_ends_span(): + fake = _FakeTracing() + tracer = SpanTracer(trace_id="t1", parent_span_id="p1", tracing=fake) + await tracer.handle(OpenSpan(key="call_1", kind="tool", name="Bash", input={"cmd": "ls"})) + await tracer.handle(CloseSpan(key="call_1", output="files", is_complete=True)) + assert fake.started == [("Bash", "p1", {"cmd": "ls"})] + assert fake.ended == [("Bash", "files")] + + +@pytest.mark.asyncio +async def test_no_trace_id_is_noop(): + fake = _FakeTracing() + tracer = SpanTracer(trace_id="", parent_span_id=None, tracing=fake) + await tracer.handle(OpenSpan(key="k", kind="tool", name="X")) + await tracer.handle(CloseSpan(key="k")) + assert fake.started == [] and fake.ended == [] + + +@pytest.mark.asyncio +async def test_tracing_failure_is_swallowed(): + class _Boom(_FakeTracing): + async def start_span(self, **kw): + raise RuntimeError("backend down") + + tracer = SpanTracer(trace_id="t1", parent_span_id="p1", tracing=_Boom()) + # Must not raise. + await tracer.handle(OpenSpan(key="k", kind="tool", name="X")) + await tracer.handle(CloseSpan(key="k")) From 7955d55b1d88b39557e129f0539c4cf42565ddc4 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:21:07 -0400 Subject: [PATCH 10/26] refactor(harness): guarded make_logger import + lifecycle contract tests for SpanTracer Co-Authored-By: Claude Opus 4.8 (1M context) --- src/agentex/lib/core/harness/tracer.py | 18 ++++++++++++++++-- tests/lib/core/harness/test_tracer.py | 13 +++++++++++++ 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/src/agentex/lib/core/harness/tracer.py b/src/agentex/lib/core/harness/tracer.py index 55fab4029..3f4ff40c2 100644 --- a/src/agentex/lib/core/harness/tracer.py +++ b/src/agentex/lib/core/harness/tracer.py @@ -2,12 +2,18 @@ from __future__ import annotations -import logging from typing import Any from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal -logger = logging.getLogger(__name__) +try: + from agentex.lib.utils.logging import make_logger + + logger = make_logger(__name__) +except Exception: # ddtrace may be absent in some envs; fall back to stdlib + import logging + + logger = logging.getLogger(__name__) class SpanTracer: @@ -19,6 +25,14 @@ class SpanTracer: The real TracingModule.end_span does NOT accept an output kwarg — output is recorded by mutating span.output before calling end_span, matching the pattern used throughout the codebase (see _langgraph_tracing.py on_tool_end etc.). + + Span-lifecycle contract: the `_open` dict (span key -> span object) is scoped + to a single turn. Pairing is by `key`: + - A duplicate OpenSpan for a key already in `_open` silently replaces the + earlier span; the earlier span is then orphaned (never closed / leaked). + - A CloseSpan for an unknown key is a no-op. + - Unpaired opens accumulate in `_open` for the lifetime of the tracer; since + a tracer is expected to live for one turn, this is bounded and acceptable. """ def __init__( diff --git a/tests/lib/core/harness/test_tracer.py b/tests/lib/core/harness/test_tracer.py index 105995bc8..f5fdb16b6 100644 --- a/tests/lib/core/harness/test_tracer.py +++ b/tests/lib/core/harness/test_tracer.py @@ -52,3 +52,16 @@ async def start_span(self, **kw): # Must not raise. await tracer.handle(OpenSpan(key="k", kind="tool", name="X")) await tracer.handle(CloseSpan(key="k")) + assert tracer._open == {} + + +@pytest.mark.asyncio +async def test_duplicate_open_replaces_silently(): + fake = _FakeTracing() + tracer = SpanTracer(trace_id="t1", parent_span_id="p1", tracing=fake) + await tracer.handle(OpenSpan(key="k", kind="tool", name="A")) + await tracer.handle(OpenSpan(key="k", kind="tool", name="B")) + await tracer.handle(CloseSpan(key="k")) + # Both opens started spans, but only the second ("B") is closed. + assert [name for name, _, _ in fake.started] == ["A", "B"] + assert fake.ended == [("B", None)] From 803191ba9ce4d2db5c25723af1c4a8934cb0220e Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:23:02 -0400 Subject: [PATCH 11/26] feat(harness): yield_events delivery adapter (passthrough + tracing) Co-Authored-By: Claude Sonnet 4.6 --- .../lib/core/harness/yield_delivery.py | 31 ++++++++++ tests/lib/core/harness/test_yield_delivery.py | 58 +++++++++++++++++++ 2 files changed, 89 insertions(+) create mode 100644 src/agentex/lib/core/harness/yield_delivery.py create mode 100644 tests/lib/core/harness/test_yield_delivery.py diff --git a/src/agentex/lib/core/harness/yield_delivery.py b/src/agentex/lib/core/harness/yield_delivery.py new file mode 100644 index 000000000..0d04647da --- /dev/null +++ b/src/agentex/lib/core/harness/yield_delivery.py @@ -0,0 +1,31 @@ +"""Yield delivery: pass the canonical stream through, tracing as a side effect.""" + +from __future__ import annotations + +from typing import AsyncIterator + +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import StreamTaskMessage + + +async def yield_events( + events: AsyncIterator[StreamTaskMessage], + tracer: SpanTracer | None = None, +) -> AsyncIterator[StreamTaskMessage]: + """Forward each event to the caller; derive + trace spans as a side effect. + + For sync HTTP ACP agents that yield events back over the response. When + `tracer` is None, this is a pure passthrough. + """ + deriver = SpanDeriver() if tracer is not None else None + try: + async for event in events: + if deriver is not None and tracer is not None: + for signal in deriver.observe(event): + await tracer.handle(signal) + yield event + finally: + if deriver is not None and tracer is not None: + for signal in deriver.flush(): + await tracer.handle(signal) diff --git a/tests/lib/core/harness/test_yield_delivery.py b/tests/lib/core/harness/test_yield_delivery.py new file mode 100644 index 000000000..46f0aeac1 --- /dev/null +++ b/tests/lib/core/harness/test_yield_delivery.py @@ -0,0 +1,58 @@ +import types as _types + +import pytest + +from agentex.lib.core.harness.yield_delivery import yield_events +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.types.task_message_update import ( + StreamTaskMessageStart, + StreamTaskMessageDone, + StreamTaskMessageFull, +) +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent + + +class _RecordTracing: + def __init__(self): + self.started, self.ended = [], [] + + async def start_span(self, *, trace_id, name, input=None, parent_id=None, data=None, task_id=None): + self.started.append(name) + return _types.SimpleNamespace() # supports arbitrary attribute assignment (span.output = ...) + + async def end_span(self, *, trace_id, span): + self.ended.append(getattr(span, "output", None)) + + +async def _gen(events): + for e in events: + yield e + + +@pytest.mark.asyncio +async def test_yield_passes_events_through_and_traces(): + fake = _RecordTracing() + tracer = SpanTracer(trace_id="t", parent_span_id="p", tracing=fake) + events = [ + StreamTaskMessageStart(type="start", index=0, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull(type="full", index=1, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="c", name="Bash", content="ok")), + ] + out = [e async for e in yield_events(_gen(events), tracer=tracer)] + assert out == events # passthrough unchanged + assert fake.started == ["Bash"] # span derived + opened + assert fake.ended == ["ok"] # span closed with response + + +@pytest.mark.asyncio +async def test_yield_without_tracer_is_pure_passthrough(): + events = [ + StreamTaskMessageDone(type="done", index=0), + ] + out = [e async for e in yield_events(_gen(events), tracer=None)] + assert out == events From dab044f862e7a734c7c248f747ebe4470cbd9a82 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:27:54 -0400 Subject: [PATCH 12/26] refactor(harness): simplify yield_events guard + cover finally-flush on early close Co-Authored-By: Claude Sonnet 4.6 --- .../lib/core/harness/yield_delivery.py | 8 ++++---- tests/lib/core/harness/test_yield_delivery.py | 19 +++++++++++++++++++ 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/src/agentex/lib/core/harness/yield_delivery.py b/src/agentex/lib/core/harness/yield_delivery.py index 0d04647da..ca923c6a3 100644 --- a/src/agentex/lib/core/harness/yield_delivery.py +++ b/src/agentex/lib/core/harness/yield_delivery.py @@ -2,7 +2,7 @@ from __future__ import annotations -from typing import AsyncIterator +from typing import AsyncGenerator, AsyncIterator from agentex.lib.core.harness.span_derivation import SpanDeriver from agentex.lib.core.harness.tracer import SpanTracer @@ -12,7 +12,7 @@ async def yield_events( events: AsyncIterator[StreamTaskMessage], tracer: SpanTracer | None = None, -) -> AsyncIterator[StreamTaskMessage]: +) -> AsyncGenerator[StreamTaskMessage, None]: """Forward each event to the caller; derive + trace spans as a side effect. For sync HTTP ACP agents that yield events back over the response. When @@ -21,11 +21,11 @@ async def yield_events( deriver = SpanDeriver() if tracer is not None else None try: async for event in events: - if deriver is not None and tracer is not None: + if deriver is not None: # tracer is non-None whenever deriver is set for signal in deriver.observe(event): await tracer.handle(signal) yield event finally: - if deriver is not None and tracer is not None: + if deriver is not None: # tracer is non-None whenever deriver is set for signal in deriver.flush(): await tracer.handle(signal) diff --git a/tests/lib/core/harness/test_yield_delivery.py b/tests/lib/core/harness/test_yield_delivery.py index 46f0aeac1..986b4a92d 100644 --- a/tests/lib/core/harness/test_yield_delivery.py +++ b/tests/lib/core/harness/test_yield_delivery.py @@ -56,3 +56,22 @@ async def test_yield_without_tracer_is_pure_passthrough(): ] out = [e async for e in yield_events(_gen(events), tracer=None)] assert out == events + + +@pytest.mark.asyncio +async def test_flush_runs_on_early_close(): + fake = _RecordTracing() + tracer = SpanTracer(trace_id="t", parent_span_id="p", tracing=fake) + events = [ + StreamTaskMessageStart(type="start", index=0, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageDone(type="done", index=0), + # response intentionally never arrives + ] + gen = yield_events(_gen(events), tracer=tracer) + first = await gen.__anext__() # Start + second = await gen.__anext__() # Done -> tool span opens here + await gen.aclose() # triggers the finally -> flush() + assert fake.started == ["Bash"] + assert fake.ended == [None] # flush closed the unpaired span (incomplete, no output) From 3cc0326bef21caaea138234302fb411d494f59a6 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:32:36 -0400 Subject: [PATCH 13/26] feat(harness): auto_send delivery adapter (canonical stream -> adk.streaming + tracing) Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/auto_send.py | 118 ++++++++++ tests/lib/core/harness/test_auto_send.py | 248 ++++++++++++++++++++++ 2 files changed, 366 insertions(+) create mode 100644 src/agentex/lib/core/harness/auto_send.py create mode 100644 tests/lib/core/harness/test_auto_send.py diff --git a/src/agentex/lib/core/harness/auto_send.py b/src/agentex/lib/core/harness/auto_send.py new file mode 100644 index 000000000..506a9ad82 --- /dev/null +++ b/src/agentex/lib/core/harness/auto_send.py @@ -0,0 +1,118 @@ +"""Auto-send delivery: canonical stream -> adk.streaming side effects + tracing.""" + +from __future__ import annotations + +from typing import Any, AsyncIterator + +from agentex.types.task_message_update import ( + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, +) + +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import StreamTaskMessage, TurnResult, TurnUsage + + +async def auto_send( + events: AsyncIterator[StreamTaskMessage], + task_id: str, + tracer: SpanTracer | None = None, + streaming: Any = None, + usage: TurnUsage | None = None, +) -> TurnResult: + """Push the canonical stream to the task stream via adk.streaming. + + Opens a streaming context per text/reasoning message, streams deltas via + ctx.stream_update, and closes via ctx.close() on Done. Posts tool + request/response full messages by opening a context with the content and + closing it immediately (no deltas). Derives and traces spans from the same + stream. Returns the accumulated final text + usage. + + Mirrors the open/close/stream_update pattern from + src/agentex/lib/adk/_modules/_langgraph_async.py: + - context opened via streaming_task_message_context(...).__aenter__() + - context closed via ctx.close() (not __aexit__) + - deltas pushed as StreamTaskMessageDelta with parent_task_message set + from ctx.task_message + + For async + temporal agents (call from inside an activity). + """ + if streaming is None: + from agentex.lib import adk + + streaming = adk.streaming + + deriver = SpanDeriver() if tracer is not None else None + final_text_parts: list[str] = [] + current_ctx: Any = None + current_kind: str | None = None # "text" | "reasoning" + + async def _close_current() -> None: + nonlocal current_ctx, current_kind + if current_ctx is not None: + await current_ctx.close() + current_ctx = None + current_kind = None + + try: + async for event in events: + if deriver is not None: + for signal in deriver.observe(event): + await tracer.handle(signal) # type: ignore[union-attr] + + if isinstance(event, StreamTaskMessageStart): + ctype = getattr(event.content, "type", None) + if ctype in ("text", "reasoning"): + await _close_current() + ctx = streaming.streaming_task_message_context( + task_id=task_id, + initial_content=event.content, + ) + current_ctx = await ctx.__aenter__() + current_kind = ctype + + elif isinstance(event, StreamTaskMessageDelta): + if current_ctx is not None and event.delta is not None: + # Reconstruct the delta with parent_task_message set from + # the context's task_message (mirrors _langgraph_async.py + # lines 72-78 and 117-127). + delta_with_parent = StreamTaskMessageDelta( + parent_task_message=current_ctx.task_message, + delta=event.delta, + type="delta", + index=event.index, + ) + await current_ctx.stream_update(delta_with_parent) + if ( + getattr(event.delta, "type", None) == "text" + and event.delta.text_delta + ): + final_text_parts.append(event.delta.text_delta) + + elif isinstance(event, StreamTaskMessageDone): + await _close_current() + + elif isinstance(event, StreamTaskMessageFull): + # Full messages (tool_request / tool_response): close any open + # streaming context first, then post the full message by opening + # a context with the content and closing it immediately + # (no deltas; StreamingTaskMessageContext.close() persists + # initial_content when the accumulator is empty). + await _close_current() + ctx = streaming.streaming_task_message_context( + task_id=task_id, + initial_content=event.content, + ) + full_ctx = await ctx.__aenter__() + await full_ctx.close() + + finally: + await _close_current() + if deriver is not None: + for signal in deriver.flush(): + await tracer.handle(signal) # type: ignore[union-attr] + + return TurnResult(final_text="".join(final_text_parts), usage=usage or TurnUsage()) diff --git a/tests/lib/core/harness/test_auto_send.py b/tests/lib/core/harness/test_auto_send.py new file mode 100644 index 000000000..2a83658e1 --- /dev/null +++ b/tests/lib/core/harness/test_auto_send.py @@ -0,0 +1,248 @@ +"""Tests for auto_send delivery adapter. + +The fake mirrors the real StreamingTaskMessageContext API exactly: +- streaming_task_message_context(...) returns a context object (synchronously) +- open the context via __aenter__ (returns self after creating the task message) +- stream deltas via ctx.stream_update(StreamTaskMessageDelta(...)) +- close via ctx.close() (NOT __aexit__) + +This mirrors _langgraph_async.py lines 62-78 and 100-127. +""" + +import types as _types + +import pytest + +from agentex.lib.core.harness.auto_send import auto_send +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.types.task_message import TaskMessage +from agentex.types.task_message_update import ( + StreamTaskMessageStart, + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageFull, +) +from agentex.types.text_content import TextContent +from agentex.types.task_message_delta import TextDelta +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent + + +class _FakeCtx: + """Mirrors StreamingTaskMessageContext: __aenter__ opens (returns self with task_message set), + close() closes. stream_update records the call. + + task_message is a real TaskMessage instance so that auto_send can use it + as parent_task_message in StreamTaskMessageDelta without Pydantic validation errors. + """ + + def __init__(self, sink, content_type, initial_content): + self.sink = sink + self.content_type = content_type + # Real TaskMessage so StreamTaskMessageDelta(parent_task_message=...) passes validation + self.task_message = TaskMessage( + id="msg-1", task_id="task1", content=initial_content + ) + + async def __aenter__(self): + self.sink.append(("open", self.content_type)) + return self + + async def __aexit__(self, *a): + # __aexit__ delegates to close in the real impl; keep for safety + await self.close() + return False + + async def close(self): + self.sink.append(("close", self.content_type)) + + async def stream_update(self, update): + self.sink.append(("update", update)) + return update + + +class _FakeStreaming: + """Mirrors StreamingService: streaming_task_message_context returns a context object.""" + + def __init__(self): + self.sink = [] + + def streaming_task_message_context( + self, task_id, initial_content, streaming_mode="coalesced", created_at=None + ): + ctype = getattr(initial_content, "type", None) + self.sink.append(("ctx", ctype)) + return _FakeCtx(self.sink, ctype, initial_content) + + +async def _gen(events): + for e in events: + yield e + + +# --------------------------------------------------------------------------- +# Test 1: text streaming — open, stream deltas, close; return accumulated text +# --------------------------------------------------------------------------- + +@pytest.mark.asyncio +async def test_auto_send_streams_text_and_returns_final_text(): + streaming = _FakeStreaming() + events = [ + StreamTaskMessageStart( + type="start", index=0, + content=TextContent(type="text", author="agent", content=""), + ), + StreamTaskMessageDelta( + type="delta", index=0, + delta=TextDelta(type="text", text_delta="Hel"), + ), + StreamTaskMessageDelta( + type="delta", index=0, + delta=TextDelta(type="text", text_delta="lo"), + ), + StreamTaskMessageDone(type="done", index=0), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + + assert result.final_text == "Hello" + + kinds = [s[0] for s in streaming.sink] + # A context was created for the text content + assert kinds[0] == "ctx" + # It was opened and closed + assert "open" in kinds + assert "close" in kinds + # Exactly two updates were streamed (one per delta) + updates = [s for s in streaming.sink if s[0] == "update"] + assert len(updates) == 2 + + +# --------------------------------------------------------------------------- +# Test 2: tool_request Full + tool_response Full — each posts one full message +# (open context with the content, no deltas, close immediately) +# --------------------------------------------------------------------------- + +@pytest.mark.asyncio +async def test_auto_send_posts_full_tool_messages(): + streaming = _FakeStreaming() + events = [ + StreamTaskMessageFull( + type="full", index=0, + content=ToolRequestContent( + type="tool_request", author="agent", + tool_call_id="c1", name="Bash", arguments={"cmd": "ls"}, + ), + ), + StreamTaskMessageFull( + type="full", index=1, + content=ToolResponseContent( + type="tool_response", author="agent", + tool_call_id="c1", name="Bash", content="file.py", + ), + ), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + + assert result.final_text == "" + + # One context per Full event + ctx_events = [s for s in streaming.sink if s[0] == "ctx"] + assert len(ctx_events) == 2 + content_types = [s[1] for s in ctx_events] + assert "tool_request" in content_types + assert "tool_response" in content_types + + # Each context is opened and closed + opens = [s for s in streaming.sink if s[0] == "open"] + closes = [s for s in streaming.sink if s[0] == "close"] + assert len(opens) == 2 + assert len(closes) == 2 + + # No stream_update calls (full messages have no deltas) + updates = [s for s in streaming.sink if s[0] == "update"] + assert len(updates) == 0 + + +# --------------------------------------------------------------------------- +# Test 3: tracing — spans are derived and handed to the tracer +# --------------------------------------------------------------------------- + +class _RecordTracing: + def __init__(self): + self.started, self.ended = [], [] + + async def start_span(self, *, trace_id, name, input=None, parent_id=None, data=None, task_id=None): + self.started.append(name) + return _types.SimpleNamespace() + + async def end_span(self, *, trace_id, span): + self.ended.append(getattr(span, "output", None)) + + +@pytest.mark.asyncio +async def test_auto_send_derives_tool_spans_via_tracer(): + fake_tracing = _RecordTracing() + tracer = SpanTracer(trace_id="t", parent_span_id="p", tracing=fake_tracing) + streaming = _FakeStreaming() + + events = [ + StreamTaskMessageStart( + type="start", index=0, + content=ToolRequestContent( + type="tool_request", author="agent", + tool_call_id="c1", name="Bash", arguments={}, + ), + ), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull( + type="full", index=1, + content=ToolResponseContent( + type="tool_response", author="agent", + tool_call_id="c1", name="Bash", content="ok", + ), + ), + ] + + result = await auto_send( + _gen(events), task_id="task1", tracer=tracer, streaming=streaming + ) + + assert result.final_text == "" + assert fake_tracing.started == ["Bash"] + assert fake_tracing.ended == ["ok"] + + +# --------------------------------------------------------------------------- +# Test 4: text followed by a tool Full — text context is closed before Full +# --------------------------------------------------------------------------- + +@pytest.mark.asyncio +async def test_auto_send_closes_text_context_before_full_message(): + streaming = _FakeStreaming() + events = [ + StreamTaskMessageStart( + type="start", index=0, + content=TextContent(type="text", author="agent", content=""), + ), + StreamTaskMessageDelta( + type="delta", index=0, + delta=TextDelta(type="text", text_delta="Hi"), + ), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull( + type="full", index=1, + content=ToolRequestContent( + type="tool_request", author="agent", + tool_call_id="c2", name="read_file", arguments={}, + ), + ), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + assert result.final_text == "Hi" + + # Verify ordering: text ctx opens, updates, closes; then tool_request ctx opens, closes + event_sequence = [(s[0], s[1]) for s in streaming.sink] + text_open_idx = next(i for i, s in enumerate(event_sequence) if s == ("open", "text")) + text_close_idx = next(i for i, s in enumerate(event_sequence) if s == ("close", "text")) + tool_open_idx = next(i for i, s in enumerate(event_sequence) if s == ("open", "tool_request")) + assert text_open_idx < text_close_idx < tool_open_idx From 260064e2ec1c0a6d8fd290c90ab2d7359316c697 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:39:25 -0400 Subject: [PATCH 14/26] refactor(harness): exception-safe full-message post + drop dead state + cover error/finally paths in auto_send Co-Authored-By: Claude Opus 4.8 (1M context) --- src/agentex/lib/core/harness/auto_send.py | 16 ++++----- tests/lib/core/harness/test_auto_send.py | 44 ++++++++++++++++++++--- 2 files changed, 46 insertions(+), 14 deletions(-) diff --git a/src/agentex/lib/core/harness/auto_send.py b/src/agentex/lib/core/harness/auto_send.py index 506a9ad82..e7de01a68 100644 --- a/src/agentex/lib/core/harness/auto_send.py +++ b/src/agentex/lib/core/harness/auto_send.py @@ -48,14 +48,12 @@ async def auto_send( deriver = SpanDeriver() if tracer is not None else None final_text_parts: list[str] = [] current_ctx: Any = None - current_kind: str | None = None # "text" | "reasoning" async def _close_current() -> None: - nonlocal current_ctx, current_kind + nonlocal current_ctx if current_ctx is not None: await current_ctx.close() current_ctx = None - current_kind = None try: async for event in events: @@ -72,7 +70,6 @@ async def _close_current() -> None: initial_content=event.content, ) current_ctx = await ctx.__aenter__() - current_kind = ctype elif isinstance(event, StreamTaskMessageDelta): if current_ctx is not None and event.delta is not None: @@ -100,14 +97,15 @@ async def _close_current() -> None: # streaming context first, then post the full message by opening # a context with the content and closing it immediately # (no deltas; StreamingTaskMessageContext.close() persists - # initial_content when the accumulator is empty). + # initial_content when the accumulator is empty). Use async with + # so the context is closed even if close() raises (__aexit__ + # delegates to close()). await _close_current() - ctx = streaming.streaming_task_message_context( + async with streaming.streaming_task_message_context( task_id=task_id, initial_content=event.content, - ) - full_ctx = await ctx.__aenter__() - await full_ctx.close() + ): + pass finally: await _close_current() diff --git a/tests/lib/core/harness/test_auto_send.py b/tests/lib/core/harness/test_auto_send.py index 2a83658e1..9568d7b87 100644 --- a/tests/lib/core/harness/test_auto_send.py +++ b/tests/lib/core/harness/test_auto_send.py @@ -126,15 +126,24 @@ async def test_auto_send_streams_text_and_returns_final_text(): async def test_auto_send_posts_full_tool_messages(): streaming = _FakeStreaming() events = [ + # A bare tool_request Start (no Done/Full) must NOT open a streaming + # context on its own — only Full events post messages. + StreamTaskMessageStart( + type="start", index=0, + content=ToolRequestContent( + type="tool_request", author="agent", + tool_call_id="c0", name="Bash", arguments={}, + ), + ), StreamTaskMessageFull( - type="full", index=0, + type="full", index=1, content=ToolRequestContent( type="tool_request", author="agent", tool_call_id="c1", name="Bash", arguments={"cmd": "ls"}, ), ), StreamTaskMessageFull( - type="full", index=1, + type="full", index=2, content=ToolResponseContent( type="tool_response", author="agent", tool_call_id="c1", name="Bash", content="file.py", @@ -145,12 +154,12 @@ async def test_auto_send_posts_full_tool_messages(): assert result.final_text == "" - # One context per Full event + # The opened contexts correspond ONLY to the two Full events — the + # tool_request Start did not open a context. ctx_events = [s for s in streaming.sink if s[0] == "ctx"] assert len(ctx_events) == 2 content_types = [s[1] for s in ctx_events] - assert "tool_request" in content_types - assert "tool_response" in content_types + assert content_types == ["tool_request", "tool_response"] # Each context is opened and closed opens = [s for s in streaming.sink if s[0] == "open"] @@ -246,3 +255,28 @@ async def test_auto_send_closes_text_context_before_full_message(): text_close_idx = next(i for i, s in enumerate(event_sequence) if s == ("close", "text")) tool_open_idx = next(i for i, s in enumerate(event_sequence) if s == ("open", "tool_request")) assert text_open_idx < text_close_idx < tool_open_idx + + +# --------------------------------------------------------------------------- +# Test 5: midstream error — propagates AND the open context is closed (finally) +# --------------------------------------------------------------------------- + +@pytest.mark.asyncio +async def test_open_context_closed_on_midstream_error(): + streaming = _FakeStreaming() + + async def _exploding_gen(): + yield StreamTaskMessageStart( + type="start", index=0, + content=TextContent(type="text", author="agent", content=""), + ) + raise RuntimeError("boom") + + with pytest.raises(RuntimeError, match="boom"): + await auto_send( + _exploding_gen(), task_id="task1", tracer=None, streaming=streaming + ) + + # The text context that was opened mid-stream was closed by the finally block. + assert ("open", "text") in [(s[0], s[1]) for s in streaming.sink] + assert ("close", "text") in [(s[0], s[1]) for s in streaming.sink] From b27367b9f8fa1cd24de528eb6dffd29d633e2203 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:42:52 -0400 Subject: [PATCH 15/26] feat(harness): UnifiedEmitter facade tying delivery + tracing + usage Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/__init__.py | 24 ++++++++++ src/agentex/lib/core/harness/emitter.py | 59 ++++++++++++++++++++++++ tests/lib/core/harness/test_emitter.py | 56 ++++++++++++++++++++++ 3 files changed, 139 insertions(+) create mode 100644 src/agentex/lib/core/harness/emitter.py create mode 100644 tests/lib/core/harness/test_emitter.py diff --git a/src/agentex/lib/core/harness/__init__.py b/src/agentex/lib/core/harness/__init__.py index 15d116148..2988db8ff 100644 --- a/src/agentex/lib/core/harness/__init__.py +++ b/src/agentex/lib/core/harness/__init__.py @@ -4,3 +4,27 @@ package derives spans from it and delivers it (yield or auto-send), so every harness tap gets streaming + tracing + turn usage uniformly. """ + +from agentex.lib.core.harness.emitter import UnifiedEmitter +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import ( + CloseSpan, + HarnessTurn, + OpenSpan, + SpanSignal, + StreamTaskMessage, + TurnResult, + TurnUsage, +) + +__all__ = [ + "UnifiedEmitter", + "SpanTracer", + "OpenSpan", + "CloseSpan", + "SpanSignal", + "StreamTaskMessage", + "TurnUsage", + "TurnResult", + "HarnessTurn", +] diff --git a/src/agentex/lib/core/harness/emitter.py b/src/agentex/lib/core/harness/emitter.py new file mode 100644 index 000000000..5944abc17 --- /dev/null +++ b/src/agentex/lib/core/harness/emitter.py @@ -0,0 +1,59 @@ +"""UnifiedEmitter: the single facade agent authors use for either delivery mode.""" + +from __future__ import annotations + +from typing import AsyncIterator + +from agentex.lib.core.harness.auto_send import auto_send +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.types import HarnessTurn, StreamTaskMessage, TurnResult +from agentex.lib.core.harness.yield_delivery import yield_events + + +class UnifiedEmitter: + """Ties trace context + chosen delivery together. + + Tracing is default-on whenever `trace_id` is truthy; pass `tracer=False` to + disable, or a custom `SpanTracer` to override. + """ + + tracer: SpanTracer | None + + def __init__( + self, + task_id: str, + trace_id: str | None, + parent_span_id: str | None, + tracer: SpanTracer | bool | None = None, + tracing: object | None = None, + ): + self.task_id = task_id + self.trace_id = trace_id + self.parent_span_id = parent_span_id + if tracer is False: + self.tracer = None + elif isinstance(tracer, SpanTracer): + self.tracer = tracer + elif trace_id: + self.tracer = SpanTracer( + trace_id=trace_id, + parent_span_id=parent_span_id, + task_id=task_id, + tracing=tracing, + ) + else: + self.tracer = None + + async def yield_turn(self, turn: HarnessTurn) -> AsyncIterator[StreamTaskMessage]: + """Sync HTTP ACP delivery: forward events, trace as side effect.""" + async for event in yield_events(turn.events, tracer=self.tracer): + yield event + + async def auto_send_turn(self, turn: HarnessTurn) -> TurnResult: + """Async/temporal delivery: push to the task stream, return TurnResult.""" + return await auto_send( + turn.events, + task_id=self.task_id, + tracer=self.tracer, + usage=turn.usage(), + ) diff --git a/tests/lib/core/harness/test_emitter.py b/tests/lib/core/harness/test_emitter.py new file mode 100644 index 000000000..318311e27 --- /dev/null +++ b/tests/lib/core/harness/test_emitter.py @@ -0,0 +1,56 @@ +import pytest + +from agentex.lib.core.harness.emitter import UnifiedEmitter +from agentex.lib.core.harness.types import TurnUsage +from agentex.types.task_message_update import StreamTaskMessageStart, StreamTaskMessageDone +from agentex.types.text_content import TextContent + + +class _FakeTracing: + async def start_span(self, **kw): + return None + + async def end_span(self, **kw): + pass + + +class _Turn: + def __init__(self, events_list, usage): + self._events_list = events_list + self._usage = usage + + @property + async def events(self): + for e in self._events_list: + yield e + + def usage(self): + return self._usage + + +@pytest.mark.asyncio +async def test_emitter_yield_mode_passes_through(): + events = [ + StreamTaskMessageStart(type="start", index=0, + content=TextContent(type="text", author="agent", content="hi")), + StreamTaskMessageDone(type="done", index=0), + ] + turn = _Turn(events, TurnUsage(model="m")) + emitter = UnifiedEmitter(task_id="t", trace_id=None, parent_span_id=None) + out = [e async for e in emitter.yield_turn(turn)] + assert out == events + + +@pytest.mark.asyncio +async def test_emitter_tracing_default_on_when_trace_id_present(): + # Inject a fake tracing backend so the test env doesn't need temporalio. + # This exercises the default-on path (tracer=None) when trace_id is truthy. + emitter = UnifiedEmitter(task_id="t", trace_id="trace1", parent_span_id="p", + tracing=_FakeTracing()) + assert emitter.tracer is not None + + +@pytest.mark.asyncio +async def test_emitter_tracing_overridable_off(): + emitter = UnifiedEmitter(task_id="t", trace_id="trace1", parent_span_id="p", tracer=False) + assert emitter.tracer is None From ed86a460f94cf7a5b2705b1b13fbdf38ed929c22 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:47:44 -0400 Subject: [PATCH 16/26] refactor(harness): inject streaming into UnifiedEmitter + cover auto_send_turn + doc tracer modes Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/emitter.py | 16 +++++-- tests/lib/core/harness/test_emitter.py | 62 ++++++++++++++++++++++++- 2 files changed, 73 insertions(+), 5 deletions(-) diff --git a/src/agentex/lib/core/harness/emitter.py b/src/agentex/lib/core/harness/emitter.py index 5944abc17..9573fb8b2 100644 --- a/src/agentex/lib/core/harness/emitter.py +++ b/src/agentex/lib/core/harness/emitter.py @@ -2,7 +2,7 @@ from __future__ import annotations -from typing import AsyncIterator +from typing import AsyncGenerator from agentex.lib.core.harness.auto_send import auto_send from agentex.lib.core.harness.tracer import SpanTracer @@ -13,8 +13,13 @@ class UnifiedEmitter: """Ties trace context + chosen delivery together. - Tracing is default-on whenever `trace_id` is truthy; pass `tracer=False` to - disable, or a custom `SpanTracer` to override. + Tracing modes (the `tracer` arg): + - tracer=None (default): auto-construct a SpanTracer if `trace_id` is present. + - tracer=False: disable tracing entirely, regardless of `trace_id`. + - tracer=: use the supplied instance. + + `tracing` and `streaming` are injection escape-hatches for tests/advanced + use; leave them None in production so the real adk modules are used. """ tracer: SpanTracer | None @@ -26,10 +31,12 @@ def __init__( parent_span_id: str | None, tracer: SpanTracer | bool | None = None, tracing: object | None = None, + streaming: object | None = None, ): self.task_id = task_id self.trace_id = trace_id self.parent_span_id = parent_span_id + self._streaming = streaming if tracer is False: self.tracer = None elif isinstance(tracer, SpanTracer): @@ -44,7 +51,7 @@ def __init__( else: self.tracer = None - async def yield_turn(self, turn: HarnessTurn) -> AsyncIterator[StreamTaskMessage]: + async def yield_turn(self, turn: HarnessTurn) -> AsyncGenerator[StreamTaskMessage, None]: """Sync HTTP ACP delivery: forward events, trace as side effect.""" async for event in yield_events(turn.events, tracer=self.tracer): yield event @@ -55,5 +62,6 @@ async def auto_send_turn(self, turn: HarnessTurn) -> TurnResult: turn.events, task_id=self.task_id, tracer=self.tracer, + streaming=self._streaming, usage=turn.usage(), ) diff --git a/tests/lib/core/harness/test_emitter.py b/tests/lib/core/harness/test_emitter.py index 318311e27..963a77dfe 100644 --- a/tests/lib/core/harness/test_emitter.py +++ b/tests/lib/core/harness/test_emitter.py @@ -2,7 +2,13 @@ from agentex.lib.core.harness.emitter import UnifiedEmitter from agentex.lib.core.harness.types import TurnUsage -from agentex.types.task_message_update import StreamTaskMessageStart, StreamTaskMessageDone +from agentex.types.task_message import TaskMessage +from agentex.types.task_message_delta import TextDelta +from agentex.types.task_message_update import ( + StreamTaskMessageDelta, + StreamTaskMessageDone, + StreamTaskMessageStart, +) from agentex.types.text_content import TextContent @@ -14,6 +20,42 @@ async def end_span(self, **kw): pass +class _FakeCtx: + """Minimal StreamingTaskMessageContext fake (see test_auto_send.py).""" + + def __init__(self, sink, content_type, initial_content): + self.sink = sink + self.content_type = content_type + self.task_message = TaskMessage(id="msg-1", task_id="task1", content=initial_content) + + async def __aenter__(self): + self.sink.append(("open", self.content_type)) + return self + + async def __aexit__(self, *a): + await self.close() + return False + + async def close(self): + self.sink.append(("close", self.content_type)) + + async def stream_update(self, update): + self.sink.append(("update", update)) + return update + + +class _FakeStreaming: + def __init__(self): + self.sink = [] + + def streaming_task_message_context( + self, task_id, initial_content, streaming_mode="coalesced", created_at=None + ): + ctype = getattr(initial_content, "type", None) + self.sink.append(("ctx", ctype)) + return _FakeCtx(self.sink, ctype, initial_content) + + class _Turn: def __init__(self, events_list, usage): self._events_list = events_list @@ -54,3 +96,21 @@ async def test_emitter_tracing_default_on_when_trace_id_present(): async def test_emitter_tracing_overridable_off(): emitter = UnifiedEmitter(task_id="t", trace_id="trace1", parent_span_id="p", tracer=False) assert emitter.tracer is None + + +@pytest.mark.asyncio +async def test_emitter_auto_send_turn_returns_usage(): + usage = TurnUsage(model="m", input_tokens=5) + events = [ + StreamTaskMessageStart(type="start", index=0, + content=TextContent(type="text", author="agent", content="")), + StreamTaskMessageDelta(type="delta", index=0, + delta=TextDelta(type="text", text_delta="Hello")), + StreamTaskMessageDone(type="done", index=0), + ] + turn = _Turn(events, usage) + fake = _FakeStreaming() + emitter = UnifiedEmitter(task_id="t", trace_id=None, parent_span_id=None, streaming=fake) + result = await emitter.auto_send_turn(turn) + assert result.usage == usage + assert result.final_text == "Hello" From b5f6b94b6c57478e7de278e0d3ec1f34a750e72e Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:50:18 -0400 Subject: [PATCH 17/26] test(harness): conformance scaffold + CI integration job skeleton Co-Authored-By: Claude Sonnet 4.6 --- .github/workflows/harness-integration.yml | 33 ++++++++++++++++ .../lib/core/harness/conformance/__init__.py | 0 tests/lib/core/harness/conformance/runner.py | 39 +++++++++++++++++++ .../harness/conformance/test_conformance.py | 28 +++++++++++++ 4 files changed, 100 insertions(+) create mode 100644 .github/workflows/harness-integration.yml create mode 100644 tests/lib/core/harness/conformance/__init__.py create mode 100644 tests/lib/core/harness/conformance/runner.py create mode 100644 tests/lib/core/harness/conformance/test_conformance.py diff --git a/.github/workflows/harness-integration.yml b/.github/workflows/harness-integration.yml new file mode 100644 index 000000000..33ca06728 --- /dev/null +++ b/.github/workflows/harness-integration.yml @@ -0,0 +1,33 @@ +name: Harness Integration + +on: + pull_request: + paths: + - "src/agentex/lib/core/harness/**" + - "src/agentex/lib/adk/_modules/**" + - ".github/workflows/harness-integration.yml" + +jobs: + conformance: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + + - name: Install uv + uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5.4.2 + with: + version: '0.10.2' + + - name: Bootstrap + run: ./scripts/bootstrap + + - name: Conformance suite + run: uv run pytest tests/lib/core/harness/ -v + + # Live integration matrix (harness x {sync, async, temporal}) is added per-harness + # in the migration plans. Placeholder job keeps the workflow valid until then. + live-matrix: + runs-on: ubuntu-latest + if: false # enabled once the first harness's test agents land + steps: + - run: echo "populated by migration PRs" diff --git a/tests/lib/core/harness/conformance/__init__.py b/tests/lib/core/harness/conformance/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/tests/lib/core/harness/conformance/runner.py b/tests/lib/core/harness/conformance/runner.py new file mode 100644 index 000000000..210514b41 --- /dev/null +++ b/tests/lib/core/harness/conformance/runner.py @@ -0,0 +1,39 @@ +"""Shared conformance engine: every harness tap registers fixtures here. + +A fixture is (name, list[StreamTaskMessage]). The runner asserts that span +derivation over the events is identical regardless of delivery channel, which is +the cross-channel guarantee from the spec. +""" + +from __future__ import annotations + +from dataclasses import dataclass + +from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.lib.core.harness.types import SpanSignal, StreamTaskMessage + + +@dataclass +class Fixture: + name: str + events: list[StreamTaskMessage] + + +_REGISTRY: list[Fixture] = [] + + +def register(fixture: Fixture) -> None: + _REGISTRY.append(fixture) + + +def all_fixtures() -> list[Fixture]: + return list(_REGISTRY) + + +def derive_all(events: list[StreamTaskMessage]) -> list[SpanSignal]: + d = SpanDeriver() + out: list[SpanSignal] = [] + for e in events: + out.extend(d.observe(e)) + out.extend(d.flush()) + return out diff --git a/tests/lib/core/harness/conformance/test_conformance.py b/tests/lib/core/harness/conformance/test_conformance.py new file mode 100644 index 000000000..cc350df3a --- /dev/null +++ b/tests/lib/core/harness/conformance/test_conformance.py @@ -0,0 +1,28 @@ +import pytest + +from tests.lib.core.harness.conformance.runner import Fixture, derive_all, register, all_fixtures +from agentex.types.task_message_update import ( + StreamTaskMessageStart, StreamTaskMessageDone, StreamTaskMessageFull, +) +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_response_content import ToolResponseContent + +register(Fixture( + name="builtin-single-tool", + events=[ + StreamTaskMessageStart(type="start", index=0, + content=ToolRequestContent(type="tool_request", author="agent", + tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull(type="full", index=1, + content=ToolResponseContent(type="tool_response", author="agent", + tool_call_id="c", name="Bash", content="ok")), + ], +)) + + +@pytest.mark.parametrize("fixture", all_fixtures(), ids=lambda f: f.name) +def test_span_derivation_is_deterministic(fixture): + # Deriving twice over the same events yields identical signals (the property + # that makes yield vs auto-send equivalent, since both observe the same stream). + assert derive_all(fixture.events) == derive_all(fixture.events) From 520849afe3f96a024cb184bd8e9d4d3663330529 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 12:56:34 -0400 Subject: [PATCH 18/26] test(harness): match scripts/test invocation + document conformance registry semantics Co-Authored-By: Claude Sonnet 4.6 --- .github/workflows/harness-integration.yml | 11 +++++++++-- tests/lib/core/harness/conformance/runner.py | 9 +++++++++ .../lib/core/harness/conformance/test_conformance.py | 2 ++ 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/.github/workflows/harness-integration.yml b/.github/workflows/harness-integration.yml index 33ca06728..ab6b353b9 100644 --- a/.github/workflows/harness-integration.yml +++ b/.github/workflows/harness-integration.yml @@ -1,6 +1,8 @@ name: Harness Integration on: + push: + branches: [main] pull_request: paths: - "src/agentex/lib/core/harness/**" @@ -21,8 +23,13 @@ jobs: - name: Bootstrap run: ./scripts/bootstrap + # Defer to scripts/test so the harness suite runs under the exact same + # invocation as the main CI test job: DEFER_PYDANTIC_BUILD=false and + # `uv run --isolated --all-packages --all-extras pytest`, across the + # min/max supported Python versions. Running `uv run pytest` directly + # would risk an all-extras-only dep passing locally but failing in CI. - name: Conformance suite - run: uv run pytest tests/lib/core/harness/ -v + run: ./scripts/test tests/lib/core/harness/ -v # Live integration matrix (harness x {sync, async, temporal}) is added per-harness # in the migration plans. Placeholder job keeps the workflow valid until then. @@ -30,4 +37,4 @@ jobs: runs-on: ubuntu-latest if: false # enabled once the first harness's test agents land steps: - - run: echo "populated by migration PRs" + - run: echo "populated by migration PRs" # TODO(harness-migration): enable per-harness; see docs/superpowers/plans migration PRs 4-8 diff --git a/tests/lib/core/harness/conformance/runner.py b/tests/lib/core/harness/conformance/runner.py index 210514b41..ffd72f89a 100644 --- a/tests/lib/core/harness/conformance/runner.py +++ b/tests/lib/core/harness/conformance/runner.py @@ -3,6 +3,15 @@ A fixture is (name, list[StreamTaskMessage]). The runner asserts that span derivation over the events is identical regardless of delivery channel, which is the cross-channel guarantee from the spec. + +Registry shared-state hazard: `_REGISTRY` is process-global. Every `test_*.py` +module that calls `register()` at import time contributes to it, so a module +that parametrizes over `all_fixtures()` will see fixtures registered by ANY +other conformance module imported earlier in the same pytest process (collection +order is not guaranteed). To stay deterministic, each future harness conformance +module should register and parametrize over its OWN fixtures (e.g. keep a +module-local list it both registers and parametrizes), rather than relying on +cross-module global accumulation via `all_fixtures()`. """ from __future__ import annotations diff --git a/tests/lib/core/harness/conformance/test_conformance.py b/tests/lib/core/harness/conformance/test_conformance.py index cc350df3a..6080ca5ef 100644 --- a/tests/lib/core/harness/conformance/test_conformance.py +++ b/tests/lib/core/harness/conformance/test_conformance.py @@ -23,6 +23,8 @@ @pytest.mark.parametrize("fixture", all_fixtures(), ids=lambda f: f.name) def test_span_derivation_is_deterministic(fixture): + """Exercises the cross-channel guarantee: yield and auto-send observe the + same event stream, so span derivation must be deterministic/idempotent.""" # Deriving twice over the same events yields identical signals (the property # that makes yield vs auto-send equivalent, since both observe the same stream). assert derive_all(fixture.events) == derive_all(fixture.events) From a915170b3efb3ecc18b46934ef7b7f7f38150df4 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 13:00:25 -0400 Subject: [PATCH 19/26] refactor(harness): isinstance narrowing for clean type-check across the package Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/auto_send.py | 6 ++---- src/agentex/lib/core/harness/span_derivation.py | 12 +++++++----- src/agentex/lib/core/harness/yield_delivery.py | 4 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/src/agentex/lib/core/harness/auto_send.py b/src/agentex/lib/core/harness/auto_send.py index e7de01a68..850146ab7 100644 --- a/src/agentex/lib/core/harness/auto_send.py +++ b/src/agentex/lib/core/harness/auto_send.py @@ -10,6 +10,7 @@ StreamTaskMessageFull, StreamTaskMessageStart, ) +from agentex.types.text_delta import TextDelta from agentex.lib.core.harness.span_derivation import SpanDeriver from agentex.lib.core.harness.tracer import SpanTracer @@ -83,10 +84,7 @@ async def _close_current() -> None: index=event.index, ) await current_ctx.stream_update(delta_with_parent) - if ( - getattr(event.delta, "type", None) == "text" - and event.delta.text_delta - ): + if isinstance(event.delta, TextDelta) and event.delta.text_delta: final_text_parts.append(event.delta.text_delta) elif isinstance(event, StreamTaskMessageDone): diff --git a/src/agentex/lib/core/harness/span_derivation.py b/src/agentex/lib/core/harness/span_derivation.py index 15e1f593f..eac929ee5 100644 --- a/src/agentex/lib/core/harness/span_derivation.py +++ b/src/agentex/lib/core/harness/span_derivation.py @@ -15,6 +15,9 @@ StreamTaskMessageFull, StreamTaskMessageStart, ) +from agentex.types.tool_request_content import ToolRequestContent +from agentex.types.tool_request_delta import ToolRequestDelta +from agentex.types.tool_response_content import ToolResponseContent from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal, StreamTaskMessage @@ -79,15 +82,14 @@ def _on_start(self, event: StreamTaskMessageStart) -> list[SpanSignal]: return [] idx = event.index content = event.content - ctype = getattr(content, "type", None) - if ctype == "tool_request": + if isinstance(content, ToolRequestContent): self._tool_by_index[idx] = _ToolReqMeta( tool_call_id=content.tool_call_id, name=content.name, arguments=dict(content.arguments or {}), ) return [] - if ctype == "reasoning": + if content.type == "reasoning": self._reasoning_index_open.add(idx) return [OpenSpan(key=f"reasoning:{idx}", kind="reasoning", name="reasoning", input={})] return [] @@ -97,7 +99,7 @@ def _on_delta(self, event: StreamTaskMessageDelta) -> list[SpanSignal]: return [] idx = event.index delta = event.delta - if delta is not None and getattr(delta, "type", None) == "tool_request": + if isinstance(delta, ToolRequestDelta): meta = self._tool_by_index.get(idx) if meta is not None and delta.arguments_delta: meta.args_buf += delta.arguments_delta @@ -105,7 +107,7 @@ def _on_delta(self, event: StreamTaskMessageDelta) -> list[SpanSignal]: def _on_full(self, event: StreamTaskMessageFull) -> list[SpanSignal]: content = event.content - if getattr(content, "type", None) == "tool_response": + if isinstance(content, ToolResponseContent): tcid = content.tool_call_id if tcid in self._open_tool_ids: self._open_tool_ids.pop(tcid, None) diff --git a/src/agentex/lib/core/harness/yield_delivery.py b/src/agentex/lib/core/harness/yield_delivery.py index ca923c6a3..0d90d5d94 100644 --- a/src/agentex/lib/core/harness/yield_delivery.py +++ b/src/agentex/lib/core/harness/yield_delivery.py @@ -21,11 +21,11 @@ async def yield_events( deriver = SpanDeriver() if tracer is not None else None try: async for event in events: - if deriver is not None: # tracer is non-None whenever deriver is set + if deriver is not None and tracer is not None: for signal in deriver.observe(event): await tracer.handle(signal) yield event finally: - if deriver is not None: # tracer is non-None whenever deriver is set + if deriver is not None and tracer is not None: for signal in deriver.flush(): await tracer.handle(signal) From e7b9c5209852200ccac31db170c435a2646f8893 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 13:04:07 -0400 Subject: [PATCH 20/26] refactor(harness): narrow auto_send tracer guards, drop type:ignore for consistency Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/auto_send.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/agentex/lib/core/harness/auto_send.py b/src/agentex/lib/core/harness/auto_send.py index 850146ab7..ee17fdc56 100644 --- a/src/agentex/lib/core/harness/auto_send.py +++ b/src/agentex/lib/core/harness/auto_send.py @@ -58,9 +58,9 @@ async def _close_current() -> None: try: async for event in events: - if deriver is not None: + if deriver is not None and tracer is not None: for signal in deriver.observe(event): - await tracer.handle(signal) # type: ignore[union-attr] + await tracer.handle(signal) if isinstance(event, StreamTaskMessageStart): ctype = getattr(event.content, "type", None) @@ -107,8 +107,8 @@ async def _close_current() -> None: finally: await _close_current() - if deriver is not None: + if deriver is not None and tracer is not None: for signal in deriver.flush(): - await tracer.handle(signal) # type: ignore[union-attr] + await tracer.handle(signal) return TurnResult(final_text="".join(final_text_parts), usage=usage or TurnUsage()) From ebc468d014e2a243b245c29d6395d4055beccbde Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 13:29:20 -0400 Subject: [PATCH 21/26] style: ruff import-sort + format fixes across the harness package Co-Authored-By: Claude Opus 4.8 (1M context) --- src/agentex/lib/core/harness/__init__.py | 12 +- src/agentex/lib/core/harness/auto_send.py | 9 +- src/agentex/lib/core/harness/emitter.py | 4 +- .../lib/core/harness/span_derivation.py | 7 +- src/agentex/lib/core/harness/tracer.py | 2 +- src/agentex/lib/core/harness/types.py | 9 +- .../lib/core/harness/yield_delivery.py | 6 +- tests/lib/core/harness/conformance/runner.py | 2 +- .../harness/conformance/test_conformance.py | 40 ++++--- tests/lib/core/harness/test_auto_send.py | 111 +++++++++++------- tests/lib/core/harness/test_emitter.py | 24 ++-- .../lib/core/harness/test_span_derivation.py | 107 ++++++++++------- tests/lib/core/harness/test_tracer.py | 2 +- tests/lib/core/harness/test_types.py | 4 +- tests/lib/core/harness/test_yield_delivery.py | 46 +++++--- 15 files changed, 228 insertions(+), 157 deletions(-) diff --git a/src/agentex/lib/core/harness/__init__.py b/src/agentex/lib/core/harness/__init__.py index 2988db8ff..067751d63 100644 --- a/src/agentex/lib/core/harness/__init__.py +++ b/src/agentex/lib/core/harness/__init__.py @@ -5,17 +5,17 @@ harness tap gets streaming + tracing + turn usage uniformly. """ -from agentex.lib.core.harness.emitter import UnifiedEmitter -from agentex.lib.core.harness.tracer import SpanTracer from agentex.lib.core.harness.types import ( - CloseSpan, - HarnessTurn, OpenSpan, + CloseSpan, + TurnUsage, SpanSignal, - StreamTaskMessage, TurnResult, - TurnUsage, + HarnessTurn, + StreamTaskMessage, ) +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.emitter import UnifiedEmitter __all__ = [ "UnifiedEmitter", diff --git a/src/agentex/lib/core/harness/auto_send.py b/src/agentex/lib/core/harness/auto_send.py index ee17fdc56..b246cf38c 100644 --- a/src/agentex/lib/core/harness/auto_send.py +++ b/src/agentex/lib/core/harness/auto_send.py @@ -4,17 +4,16 @@ from typing import Any, AsyncIterator +from agentex.types.text_delta import TextDelta +from agentex.lib.core.harness.types import TurnUsage, TurnResult, StreamTaskMessage +from agentex.lib.core.harness.tracer import SpanTracer from agentex.types.task_message_update import ( - StreamTaskMessageDelta, StreamTaskMessageDone, StreamTaskMessageFull, + StreamTaskMessageDelta, StreamTaskMessageStart, ) -from agentex.types.text_delta import TextDelta - from agentex.lib.core.harness.span_derivation import SpanDeriver -from agentex.lib.core.harness.tracer import SpanTracer -from agentex.lib.core.harness.types import StreamTaskMessage, TurnResult, TurnUsage async def auto_send( diff --git a/src/agentex/lib/core/harness/emitter.py b/src/agentex/lib/core/harness/emitter.py index 9573fb8b2..681c859ea 100644 --- a/src/agentex/lib/core/harness/emitter.py +++ b/src/agentex/lib/core/harness/emitter.py @@ -4,9 +4,9 @@ from typing import AsyncGenerator -from agentex.lib.core.harness.auto_send import auto_send +from agentex.lib.core.harness.types import TurnResult, HarnessTurn, StreamTaskMessage from agentex.lib.core.harness.tracer import SpanTracer -from agentex.lib.core.harness.types import HarnessTurn, StreamTaskMessage, TurnResult +from agentex.lib.core.harness.auto_send import auto_send from agentex.lib.core.harness.yield_delivery import yield_events diff --git a/src/agentex/lib/core/harness/span_derivation.py b/src/agentex/lib/core/harness/span_derivation.py index eac929ee5..d353cf9e0 100644 --- a/src/agentex/lib/core/harness/span_derivation.py +++ b/src/agentex/lib/core/harness/span_derivation.py @@ -9,18 +9,17 @@ import json from dataclasses import dataclass +from agentex.lib.core.harness.types import OpenSpan, CloseSpan, SpanSignal, StreamTaskMessage +from agentex.types.tool_request_delta import ToolRequestDelta from agentex.types.task_message_update import ( - StreamTaskMessageDelta, StreamTaskMessageDone, StreamTaskMessageFull, + StreamTaskMessageDelta, StreamTaskMessageStart, ) from agentex.types.tool_request_content import ToolRequestContent -from agentex.types.tool_request_delta import ToolRequestDelta from agentex.types.tool_response_content import ToolResponseContent -from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal, StreamTaskMessage - @dataclass class _ToolReqMeta: diff --git a/src/agentex/lib/core/harness/tracer.py b/src/agentex/lib/core/harness/tracer.py index 3f4ff40c2..8384407bd 100644 --- a/src/agentex/lib/core/harness/tracer.py +++ b/src/agentex/lib/core/harness/tracer.py @@ -4,7 +4,7 @@ from typing import Any -from agentex.lib.core.harness.types import CloseSpan, OpenSpan, SpanSignal +from agentex.lib.core.harness.types import OpenSpan, CloseSpan, SpanSignal try: from agentex.lib.utils.logging import make_logger diff --git a/src/agentex/lib/core/harness/types.py b/src/agentex/lib/core/harness/types.py index f31b2c67f..64104d316 100644 --- a/src/agentex/lib/core/harness/types.py +++ b/src/agentex/lib/core/harness/types.py @@ -2,16 +2,17 @@ from __future__ import annotations -from dataclasses import dataclass, field -from typing import Any, AsyncIterator, Literal, Protocol, Union, runtime_checkable +from typing import Any, Union, Literal, Protocol, AsyncIterator, runtime_checkable +from dataclasses import field, dataclass + +from pydantic import BaseModel, ConfigDict from agentex.types.task_message_update import ( - StreamTaskMessageDelta, StreamTaskMessageDone, StreamTaskMessageFull, + StreamTaskMessageDelta, StreamTaskMessageStart, ) -from pydantic import BaseModel, ConfigDict # The canonical stream element. Taps yield these; delivery adapters consume them. StreamTaskMessage = Union[ diff --git a/src/agentex/lib/core/harness/yield_delivery.py b/src/agentex/lib/core/harness/yield_delivery.py index 0d90d5d94..69b39f152 100644 --- a/src/agentex/lib/core/harness/yield_delivery.py +++ b/src/agentex/lib/core/harness/yield_delivery.py @@ -2,11 +2,11 @@ from __future__ import annotations -from typing import AsyncGenerator, AsyncIterator +from typing import AsyncIterator, AsyncGenerator -from agentex.lib.core.harness.span_derivation import SpanDeriver -from agentex.lib.core.harness.tracer import SpanTracer from agentex.lib.core.harness.types import StreamTaskMessage +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.lib.core.harness.span_derivation import SpanDeriver async def yield_events( diff --git a/tests/lib/core/harness/conformance/runner.py b/tests/lib/core/harness/conformance/runner.py index ffd72f89a..81a74860c 100644 --- a/tests/lib/core/harness/conformance/runner.py +++ b/tests/lib/core/harness/conformance/runner.py @@ -18,8 +18,8 @@ from dataclasses import dataclass -from agentex.lib.core.harness.span_derivation import SpanDeriver from agentex.lib.core.harness.types import SpanSignal, StreamTaskMessage +from agentex.lib.core.harness.span_derivation import SpanDeriver @dataclass diff --git a/tests/lib/core/harness/conformance/test_conformance.py b/tests/lib/core/harness/conformance/test_conformance.py index 6080ca5ef..1d686c33a 100644 --- a/tests/lib/core/harness/conformance/test_conformance.py +++ b/tests/lib/core/harness/conformance/test_conformance.py @@ -1,24 +1,36 @@ import pytest -from tests.lib.core.harness.conformance.runner import Fixture, derive_all, register, all_fixtures from agentex.types.task_message_update import ( - StreamTaskMessageStart, StreamTaskMessageDone, StreamTaskMessageFull, + StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageStart, ) from agentex.types.tool_request_content import ToolRequestContent from agentex.types.tool_response_content import ToolResponseContent +from tests.lib.core.harness.conformance.runner import Fixture, register, derive_all, all_fixtures -register(Fixture( - name="builtin-single-tool", - events=[ - StreamTaskMessageStart(type="start", index=0, - content=ToolRequestContent(type="tool_request", author="agent", - tool_call_id="c", name="Bash", arguments={})), - StreamTaskMessageDone(type="done", index=0), - StreamTaskMessageFull(type="full", index=1, - content=ToolResponseContent(type="tool_response", author="agent", - tool_call_id="c", name="Bash", content="ok")), - ], -)) +register( + Fixture( + name="builtin-single-tool", + events=[ + StreamTaskMessageStart( + type="start", + index=0, + content=ToolRequestContent( + type="tool_request", author="agent", tool_call_id="c", name="Bash", arguments={} + ), + ), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull( + type="full", + index=1, + content=ToolResponseContent( + type="tool_response", author="agent", tool_call_id="c", name="Bash", content="ok" + ), + ), + ], + ) +) @pytest.mark.parametrize("fixture", all_fixtures(), ids=lambda f: f.name) diff --git a/tests/lib/core/harness/test_auto_send.py b/tests/lib/core/harness/test_auto_send.py index 9568d7b87..e7331e67c 100644 --- a/tests/lib/core/harness/test_auto_send.py +++ b/tests/lib/core/harness/test_auto_send.py @@ -13,17 +13,17 @@ import pytest -from agentex.lib.core.harness.auto_send import auto_send -from agentex.lib.core.harness.tracer import SpanTracer from agentex.types.task_message import TaskMessage +from agentex.types.text_content import TextContent +from agentex.lib.core.harness.tracer import SpanTracer +from agentex.types.task_message_delta import TextDelta from agentex.types.task_message_update import ( - StreamTaskMessageStart, - StreamTaskMessageDelta, StreamTaskMessageDone, StreamTaskMessageFull, + StreamTaskMessageDelta, + StreamTaskMessageStart, ) -from agentex.types.text_content import TextContent -from agentex.types.task_message_delta import TextDelta +from agentex.lib.core.harness.auto_send import auto_send from agentex.types.tool_request_content import ToolRequestContent from agentex.types.tool_response_content import ToolResponseContent @@ -40,9 +40,7 @@ def __init__(self, sink, content_type, initial_content): self.sink = sink self.content_type = content_type # Real TaskMessage so StreamTaskMessageDelta(parent_task_message=...) passes validation - self.task_message = TaskMessage( - id="msg-1", task_id="task1", content=initial_content - ) + self.task_message = TaskMessage(id="msg-1", task_id="task1", content=initial_content) async def __aenter__(self): self.sink.append(("open", self.content_type)) @@ -67,9 +65,7 @@ class _FakeStreaming: def __init__(self): self.sink = [] - def streaming_task_message_context( - self, task_id, initial_content, streaming_mode="coalesced", created_at=None - ): + def streaming_task_message_context(self, task_id, initial_content, streaming_mode="coalesced", created_at=None): ctype = getattr(initial_content, "type", None) self.sink.append(("ctx", ctype)) return _FakeCtx(self.sink, ctype, initial_content) @@ -84,20 +80,24 @@ async def _gen(events): # Test 1: text streaming — open, stream deltas, close; return accumulated text # --------------------------------------------------------------------------- + @pytest.mark.asyncio async def test_auto_send_streams_text_and_returns_final_text(): streaming = _FakeStreaming() events = [ StreamTaskMessageStart( - type="start", index=0, + type="start", + index=0, content=TextContent(type="text", author="agent", content=""), ), StreamTaskMessageDelta( - type="delta", index=0, + type="delta", + index=0, delta=TextDelta(type="text", text_delta="Hel"), ), StreamTaskMessageDelta( - type="delta", index=0, + type="delta", + index=0, delta=TextDelta(type="text", text_delta="lo"), ), StreamTaskMessageDone(type="done", index=0), @@ -122,6 +122,7 @@ async def test_auto_send_streams_text_and_returns_final_text(): # (open context with the content, no deltas, close immediately) # --------------------------------------------------------------------------- + @pytest.mark.asyncio async def test_auto_send_posts_full_tool_messages(): streaming = _FakeStreaming() @@ -129,24 +130,36 @@ async def test_auto_send_posts_full_tool_messages(): # A bare tool_request Start (no Done/Full) must NOT open a streaming # context on its own — only Full events post messages. StreamTaskMessageStart( - type="start", index=0, + type="start", + index=0, content=ToolRequestContent( - type="tool_request", author="agent", - tool_call_id="c0", name="Bash", arguments={}, + type="tool_request", + author="agent", + tool_call_id="c0", + name="Bash", + arguments={}, ), ), StreamTaskMessageFull( - type="full", index=1, + type="full", + index=1, content=ToolRequestContent( - type="tool_request", author="agent", - tool_call_id="c1", name="Bash", arguments={"cmd": "ls"}, + type="tool_request", + author="agent", + tool_call_id="c1", + name="Bash", + arguments={"cmd": "ls"}, ), ), StreamTaskMessageFull( - type="full", index=2, + type="full", + index=2, content=ToolResponseContent( - type="tool_response", author="agent", - tool_call_id="c1", name="Bash", content="file.py", + type="tool_response", + author="agent", + tool_call_id="c1", + name="Bash", + content="file.py", ), ), ] @@ -176,6 +189,7 @@ async def test_auto_send_posts_full_tool_messages(): # Test 3: tracing — spans are derived and handed to the tracer # --------------------------------------------------------------------------- + class _RecordTracing: def __init__(self): self.started, self.ended = [], [] @@ -196,25 +210,31 @@ async def test_auto_send_derives_tool_spans_via_tracer(): events = [ StreamTaskMessageStart( - type="start", index=0, + type="start", + index=0, content=ToolRequestContent( - type="tool_request", author="agent", - tool_call_id="c1", name="Bash", arguments={}, + type="tool_request", + author="agent", + tool_call_id="c1", + name="Bash", + arguments={}, ), ), StreamTaskMessageDone(type="done", index=0), StreamTaskMessageFull( - type="full", index=1, + type="full", + index=1, content=ToolResponseContent( - type="tool_response", author="agent", - tool_call_id="c1", name="Bash", content="ok", + type="tool_response", + author="agent", + tool_call_id="c1", + name="Bash", + content="ok", ), ), ] - result = await auto_send( - _gen(events), task_id="task1", tracer=tracer, streaming=streaming - ) + result = await auto_send(_gen(events), task_id="task1", tracer=tracer, streaming=streaming) assert result.final_text == "" assert fake_tracing.started == ["Bash"] @@ -225,24 +245,31 @@ async def test_auto_send_derives_tool_spans_via_tracer(): # Test 4: text followed by a tool Full — text context is closed before Full # --------------------------------------------------------------------------- + @pytest.mark.asyncio async def test_auto_send_closes_text_context_before_full_message(): streaming = _FakeStreaming() events = [ StreamTaskMessageStart( - type="start", index=0, + type="start", + index=0, content=TextContent(type="text", author="agent", content=""), ), StreamTaskMessageDelta( - type="delta", index=0, + type="delta", + index=0, delta=TextDelta(type="text", text_delta="Hi"), ), StreamTaskMessageDone(type="done", index=0), StreamTaskMessageFull( - type="full", index=1, + type="full", + index=1, content=ToolRequestContent( - type="tool_request", author="agent", - tool_call_id="c2", name="read_file", arguments={}, + type="tool_request", + author="agent", + tool_call_id="c2", + name="read_file", + arguments={}, ), ), ] @@ -261,21 +288,21 @@ async def test_auto_send_closes_text_context_before_full_message(): # Test 5: midstream error — propagates AND the open context is closed (finally) # --------------------------------------------------------------------------- + @pytest.mark.asyncio async def test_open_context_closed_on_midstream_error(): streaming = _FakeStreaming() async def _exploding_gen(): yield StreamTaskMessageStart( - type="start", index=0, + type="start", + index=0, content=TextContent(type="text", author="agent", content=""), ) raise RuntimeError("boom") with pytest.raises(RuntimeError, match="boom"): - await auto_send( - _exploding_gen(), task_id="task1", tracer=None, streaming=streaming - ) + await auto_send(_exploding_gen(), task_id="task1", tracer=None, streaming=streaming) # The text context that was opened mid-stream was closed by the finally block. assert ("open", "text") in [(s[0], s[1]) for s in streaming.sink] diff --git a/tests/lib/core/harness/test_emitter.py b/tests/lib/core/harness/test_emitter.py index 963a77dfe..ee3052f47 100644 --- a/tests/lib/core/harness/test_emitter.py +++ b/tests/lib/core/harness/test_emitter.py @@ -1,15 +1,15 @@ import pytest -from agentex.lib.core.harness.emitter import UnifiedEmitter -from agentex.lib.core.harness.types import TurnUsage from agentex.types.task_message import TaskMessage +from agentex.types.text_content import TextContent +from agentex.lib.core.harness.types import TurnUsage +from agentex.lib.core.harness.emitter import UnifiedEmitter from agentex.types.task_message_delta import TextDelta from agentex.types.task_message_update import ( - StreamTaskMessageDelta, StreamTaskMessageDone, + StreamTaskMessageDelta, StreamTaskMessageStart, ) -from agentex.types.text_content import TextContent class _FakeTracing: @@ -48,9 +48,7 @@ class _FakeStreaming: def __init__(self): self.sink = [] - def streaming_task_message_context( - self, task_id, initial_content, streaming_mode="coalesced", created_at=None - ): + def streaming_task_message_context(self, task_id, initial_content, streaming_mode="coalesced", created_at=None): ctype = getattr(initial_content, "type", None) self.sink.append(("ctx", ctype)) return _FakeCtx(self.sink, ctype, initial_content) @@ -73,8 +71,7 @@ def usage(self): @pytest.mark.asyncio async def test_emitter_yield_mode_passes_through(): events = [ - StreamTaskMessageStart(type="start", index=0, - content=TextContent(type="text", author="agent", content="hi")), + StreamTaskMessageStart(type="start", index=0, content=TextContent(type="text", author="agent", content="hi")), StreamTaskMessageDone(type="done", index=0), ] turn = _Turn(events, TurnUsage(model="m")) @@ -87,8 +84,7 @@ async def test_emitter_yield_mode_passes_through(): async def test_emitter_tracing_default_on_when_trace_id_present(): # Inject a fake tracing backend so the test env doesn't need temporalio. # This exercises the default-on path (tracer=None) when trace_id is truthy. - emitter = UnifiedEmitter(task_id="t", trace_id="trace1", parent_span_id="p", - tracing=_FakeTracing()) + emitter = UnifiedEmitter(task_id="t", trace_id="trace1", parent_span_id="p", tracing=_FakeTracing()) assert emitter.tracer is not None @@ -102,10 +98,8 @@ async def test_emitter_tracing_overridable_off(): async def test_emitter_auto_send_turn_returns_usage(): usage = TurnUsage(model="m", input_tokens=5) events = [ - StreamTaskMessageStart(type="start", index=0, - content=TextContent(type="text", author="agent", content="")), - StreamTaskMessageDelta(type="delta", index=0, - delta=TextDelta(type="text", text_delta="Hello")), + StreamTaskMessageStart(type="start", index=0, content=TextContent(type="text", author="agent", content="")), + StreamTaskMessageDelta(type="delta", index=0, delta=TextDelta(type="text", text_delta="Hello")), StreamTaskMessageDone(type="done", index=0), ] turn = _Turn(events, usage) diff --git a/tests/lib/core/harness/test_span_derivation.py b/tests/lib/core/harness/test_span_derivation.py index 0630131d0..7779de815 100644 --- a/tests/lib/core/harness/test_span_derivation.py +++ b/tests/lib/core/harness/test_span_derivation.py @@ -1,16 +1,16 @@ -from agentex.lib.core.harness.span_derivation import SpanDeriver +from agentex.types.text_content import TextContent from agentex.lib.core.harness.types import OpenSpan, CloseSpan +from agentex.types.reasoning_content import ReasoningContent +from agentex.types.tool_request_delta import ToolRequestDelta from agentex.types.task_message_update import ( - StreamTaskMessageStart, - StreamTaskMessageDelta, - StreamTaskMessageFull, StreamTaskMessageDone, + StreamTaskMessageFull, + StreamTaskMessageDelta, + StreamTaskMessageStart, ) -from agentex.types.text_content import TextContent -from agentex.types.reasoning_content import ReasoningContent from agentex.types.tool_request_content import ToolRequestContent from agentex.types.tool_response_content import ToolResponseContent -from agentex.types.tool_request_delta import ToolRequestDelta +from agentex.lib.core.harness.span_derivation import SpanDeriver def _signals(deriver, events): @@ -23,19 +23,17 @@ def _signals(deriver, events): def _tool_req(idx, tcid, name, args): return StreamTaskMessageStart( - type="start", index=idx, - content=ToolRequestContent(type="tool_request", author="agent", - tool_call_id=tcid, name=name, arguments=args), + type="start", + index=idx, + content=ToolRequestContent(type="tool_request", author="agent", tool_call_id=tcid, name=name, arguments=args), ) def test_text_only_yields_no_spans(): d = SpanDeriver() events = [ - StreamTaskMessageStart(type="start", index=0, - content=TextContent(type="text", author="agent", content="")), - StreamTaskMessageDelta(type="delta", index=0, - delta=None), + StreamTaskMessageStart(type="start", index=0, content=TextContent(type="text", author="agent", content="")), + StreamTaskMessageDelta(type="delta", index=0, delta=None), StreamTaskMessageDone(type="done", index=0), ] assert _signals(d, events) == [] @@ -46,9 +44,13 @@ def test_single_tool_opens_on_done_closes_on_response(): events = [ _tool_req(0, "call_1", "Bash", {"cmd": "ls"}), StreamTaskMessageDone(type="done", index=0), - StreamTaskMessageFull(type="full", index=1, - content=ToolResponseContent(type="tool_response", author="agent", - tool_call_id="call_1", name="Bash", content="files")), + StreamTaskMessageFull( + type="full", + index=1, + content=ToolResponseContent( + type="tool_response", author="agent", tool_call_id="call_1", name="Bash", content="files" + ), + ), ] sigs = _signals(d, events) assert sigs == [ @@ -60,8 +62,9 @@ def test_single_tool_opens_on_done_closes_on_response(): def test_reasoning_opens_on_start_closes_on_done(): d = SpanDeriver() events = [ - StreamTaskMessageStart(type="start", index=0, - content=ReasoningContent(type="reasoning", author="agent", summary=[], content=[])), + StreamTaskMessageStart( + type="start", index=0, content=ReasoningContent(type="reasoning", author="agent", summary=[], content=[]) + ), StreamTaskMessageDone(type="done", index=0), ] sigs = _signals(d, events) @@ -76,12 +79,20 @@ def test_parallel_tools_pair_by_tool_call_id(): _tool_req(1, "b", "T2", {}), StreamTaskMessageDone(type="done", index=0), StreamTaskMessageDone(type="done", index=1), - StreamTaskMessageFull(type="full", index=2, - content=ToolResponseContent(type="tool_response", author="agent", - tool_call_id="b", name="T2", content="rb")), - StreamTaskMessageFull(type="full", index=3, - content=ToolResponseContent(type="tool_response", author="agent", - tool_call_id="a", name="T1", content="ra")), + StreamTaskMessageFull( + type="full", + index=2, + content=ToolResponseContent( + type="tool_response", author="agent", tool_call_id="b", name="T2", content="rb" + ), + ), + StreamTaskMessageFull( + type="full", + index=3, + content=ToolResponseContent( + type="tool_response", author="agent", tool_call_id="a", name="T1", content="ra" + ), + ), ] sigs = _signals(d, events) opens = [s for s in sigs if isinstance(s, OpenSpan)] @@ -94,15 +105,23 @@ def test_parallel_tools_pair_by_tool_call_id(): def test_streamed_args_accumulate_into_open_input(): d = SpanDeriver() events = [ - StreamTaskMessageStart(type="start", index=0, - content=ToolRequestContent(type="tool_request", author="agent", - tool_call_id="c", name="Bash", arguments={})), - StreamTaskMessageDelta(type="delta", index=0, - delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", - arguments_delta='{"cmd":')), - StreamTaskMessageDelta(type="delta", index=0, - delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", - arguments_delta='"ls"}')), + StreamTaskMessageStart( + type="start", + index=0, + content=ToolRequestContent( + type="tool_request", author="agent", tool_call_id="c", name="Bash", arguments={} + ), + ), + StreamTaskMessageDelta( + type="delta", + index=0, + delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", arguments_delta='{"cmd":'), + ), + StreamTaskMessageDelta( + type="delta", + index=0, + delta=ToolRequestDelta(type="tool_request", tool_call_id="c", name="Bash", arguments_delta='"ls"}'), + ), StreamTaskMessageDone(type="done", index=0), ] sigs = _signals(d, events) @@ -123,9 +142,13 @@ def test_unclosed_tool_closed_incomplete_on_flush(): def test_none_index_is_skipped(): d = SpanDeriver() events = [ - StreamTaskMessageStart(type="start", index=None, - content=ToolRequestContent(type="tool_request", author="agent", - tool_call_id="n", name="Bash", arguments={})), + StreamTaskMessageStart( + type="start", + index=None, + content=ToolRequestContent( + type="tool_request", author="agent", tool_call_id="n", name="Bash", arguments={} + ), + ), StreamTaskMessageDone(type="done", index=None), ] assert _signals(d, events) == [] @@ -134,8 +157,12 @@ def test_none_index_is_skipped(): def test_orphan_tool_response_ignored(): d = SpanDeriver() events = [ - StreamTaskMessageFull(type="full", index=0, - content=ToolResponseContent(type="tool_response", author="agent", - tool_call_id="z", name="Bash", content="r")), + StreamTaskMessageFull( + type="full", + index=0, + content=ToolResponseContent( + type="tool_response", author="agent", tool_call_id="z", name="Bash", content="r" + ), + ), ] assert _signals(d, events) == [] diff --git a/tests/lib/core/harness/test_tracer.py b/tests/lib/core/harness/test_tracer.py index f5fdb16b6..7e1a4bd67 100644 --- a/tests/lib/core/harness/test_tracer.py +++ b/tests/lib/core/harness/test_tracer.py @@ -1,7 +1,7 @@ import pytest -from agentex.lib.core.harness.tracer import SpanTracer from agentex.lib.core.harness.types import OpenSpan, CloseSpan +from agentex.lib.core.harness.tracer import SpanTracer class _FakeSpan: diff --git a/tests/lib/core/harness/test_types.py b/tests/lib/core/harness/test_types.py index 91857993a..68bc89ce2 100644 --- a/tests/lib/core/harness/test_types.py +++ b/tests/lib/core/harness/test_types.py @@ -3,10 +3,10 @@ from agentex.lib.core.harness.types import ( OpenSpan, CloseSpan, - HarnessTurn, - StreamTaskMessage, TurnUsage, TurnResult, + HarnessTurn, + StreamTaskMessage, ) diff --git a/tests/lib/core/harness/test_yield_delivery.py b/tests/lib/core/harness/test_yield_delivery.py index 986b4a92d..f3f491d84 100644 --- a/tests/lib/core/harness/test_yield_delivery.py +++ b/tests/lib/core/harness/test_yield_delivery.py @@ -2,15 +2,15 @@ import pytest -from agentex.lib.core.harness.yield_delivery import yield_events from agentex.lib.core.harness.tracer import SpanTracer from agentex.types.task_message_update import ( - StreamTaskMessageStart, StreamTaskMessageDone, StreamTaskMessageFull, + StreamTaskMessageStart, ) from agentex.types.tool_request_content import ToolRequestContent from agentex.types.tool_response_content import ToolResponseContent +from agentex.lib.core.harness.yield_delivery import yield_events class _RecordTracing: @@ -35,18 +35,26 @@ async def test_yield_passes_events_through_and_traces(): fake = _RecordTracing() tracer = SpanTracer(trace_id="t", parent_span_id="p", tracing=fake) events = [ - StreamTaskMessageStart(type="start", index=0, - content=ToolRequestContent(type="tool_request", author="agent", - tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageStart( + type="start", + index=0, + content=ToolRequestContent( + type="tool_request", author="agent", tool_call_id="c", name="Bash", arguments={} + ), + ), StreamTaskMessageDone(type="done", index=0), - StreamTaskMessageFull(type="full", index=1, - content=ToolResponseContent(type="tool_response", author="agent", - tool_call_id="c", name="Bash", content="ok")), + StreamTaskMessageFull( + type="full", + index=1, + content=ToolResponseContent( + type="tool_response", author="agent", tool_call_id="c", name="Bash", content="ok" + ), + ), ] out = [e async for e in yield_events(_gen(events), tracer=tracer)] - assert out == events # passthrough unchanged - assert fake.started == ["Bash"] # span derived + opened - assert fake.ended == ["ok"] # span closed with response + assert out == events # passthrough unchanged + assert fake.started == ["Bash"] # span derived + opened + assert fake.ended == ["ok"] # span closed with response @pytest.mark.asyncio @@ -63,15 +71,19 @@ async def test_flush_runs_on_early_close(): fake = _RecordTracing() tracer = SpanTracer(trace_id="t", parent_span_id="p", tracing=fake) events = [ - StreamTaskMessageStart(type="start", index=0, - content=ToolRequestContent(type="tool_request", author="agent", - tool_call_id="c", name="Bash", arguments={})), + StreamTaskMessageStart( + type="start", + index=0, + content=ToolRequestContent( + type="tool_request", author="agent", tool_call_id="c", name="Bash", arguments={} + ), + ), StreamTaskMessageDone(type="done", index=0), # response intentionally never arrives ] gen = yield_events(_gen(events), tracer=tracer) - first = await gen.__anext__() # Start + first = await gen.__anext__() # Start second = await gen.__anext__() # Done -> tool span opens here - await gen.aclose() # triggers the finally -> flush() + await gen.aclose() # triggers the finally -> flush() assert fake.started == ["Bash"] - assert fake.ended == [None] # flush closed the unpaired span (incomplete, no output) + assert fake.ended == [None] # flush closed the unpaired span (incomplete, no output) From 8b0da837d2ee13de1f7d93ebda0fa594cc0f5cd6 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 15:21:03 -0400 Subject: [PATCH 22/26] fix(harness): mark overridden start_span with @override for pyright (reportImplicitOverride) Co-Authored-By: Claude Opus 4.8 (1M context) --- tests/lib/core/harness/test_tracer.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/lib/core/harness/test_tracer.py b/tests/lib/core/harness/test_tracer.py index 7e1a4bd67..315b74417 100644 --- a/tests/lib/core/harness/test_tracer.py +++ b/tests/lib/core/harness/test_tracer.py @@ -1,3 +1,5 @@ +from typing import override + import pytest from agentex.lib.core.harness.types import OpenSpan, CloseSpan @@ -45,6 +47,7 @@ async def test_no_trace_id_is_noop(): @pytest.mark.asyncio async def test_tracing_failure_is_swallowed(): class _Boom(_FakeTracing): + @override async def start_span(self, **kw): raise RuntimeError("backend down") From f9266cf6b07ee952b19b75909ce7e3a7a9138908 Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 15:23:26 -0400 Subject: [PATCH 23/26] fix(harness): relative import in conformance test for pyright (reportMissingImports) Co-Authored-By: Claude Opus 4.8 (1M context) --- tests/lib/core/harness/conformance/test_conformance.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/lib/core/harness/conformance/test_conformance.py b/tests/lib/core/harness/conformance/test_conformance.py index 1d686c33a..d9eec1c15 100644 --- a/tests/lib/core/harness/conformance/test_conformance.py +++ b/tests/lib/core/harness/conformance/test_conformance.py @@ -7,7 +7,8 @@ ) from agentex.types.tool_request_content import ToolRequestContent from agentex.types.tool_response_content import ToolResponseContent -from tests.lib.core.harness.conformance.runner import Fixture, register, derive_all, all_fixtures + +from .runner import Fixture, register, derive_all, all_fixtures register( Fixture( From b538187a43577282544cc474c2c3b8f627a9f2cf Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 16:51:49 -0400 Subject: [PATCH 24/26] fix(harness): index-keyed routing, tool stream delivery, final_text last-segment, created_at (AGX1-377, AGX1-378) auto_send.py: - Replace single current_ctx with ctx_map[index] so parallel streams route correctly - Open a streaming context for ALL content types on Start (not just text/reasoning), fixing tool_request/tool_response stream delivery (AGX1-377) - Reset final_text_parts on each new Start(TextContent) and on Full(TextContent) so multi-step turns return the LAST text segment, not the full accumulation - Add created_at: datetime | None param; forward to every streaming_task_message_context call (AGX1-378) span_derivation.py: - _on_full: handle Full(ToolRequestContent) by opening a tool span keyed by tool_call_id if not already open; adds LangGraph full-event harness support Co-Authored-By: Claude Sonnet 4.6 --- src/agentex/lib/core/harness/auto_send.py | 86 ++++++++++++------- .../lib/core/harness/span_derivation.py | 14 +++ 2 files changed, 71 insertions(+), 29 deletions(-) diff --git a/src/agentex/lib/core/harness/auto_send.py b/src/agentex/lib/core/harness/auto_send.py index b246cf38c..899429034 100644 --- a/src/agentex/lib/core/harness/auto_send.py +++ b/src/agentex/lib/core/harness/auto_send.py @@ -3,8 +3,10 @@ from __future__ import annotations from typing import Any, AsyncIterator +from datetime import datetime from agentex.types.text_delta import TextDelta +from agentex.types.text_content import TextContent from agentex.lib.core.harness.types import TurnUsage, TurnResult, StreamTaskMessage from agentex.lib.core.harness.tracer import SpanTracer from agentex.types.task_message_update import ( @@ -22,14 +24,27 @@ async def auto_send( tracer: SpanTracer | None = None, streaming: Any = None, usage: TurnUsage | None = None, + created_at: datetime | None = None, ) -> TurnResult: """Push the canonical stream to the task stream via adk.streaming. - Opens a streaming context per text/reasoning message, streams deltas via + Opens a streaming context per message (keyed by index), streams deltas via ctx.stream_update, and closes via ctx.close() on Done. Posts tool request/response full messages by opening a context with the content and closing it immediately (no deltas). Derives and traces spans from the same - stream. Returns the accumulated final text + usage. + stream. Returns the last text segment's text + usage. + + Index-keyed routing: each Start(index=i) opens a context stored in + ctx_map[i]; Delta(index=i) routes to ctx_map.get(i); Done(index=i) closes + and removes ctx_map[i]. Events with index is None are skipped. The finally + block closes all remaining open contexts. + + final_text last-segment semantics: a new Start(TextContent) resets + final_text_parts so that multi-step turns return the LAST text segment. + Full(TextContent) also overwrites final_text_parts (same semantics). + + AGX1-378: created_at is forwarded to every streaming_task_message_context + call so callers can back-date message timestamps. Mirrors the open/close/stream_update pattern from src/agentex/lib/adk/_modules/_langgraph_async.py: @@ -47,13 +62,12 @@ async def auto_send( deriver = SpanDeriver() if tracer is not None else None final_text_parts: list[str] = [] - current_ctx: Any = None + ctx_map: dict[int, Any] = {} - async def _close_current() -> None: - nonlocal current_ctx - if current_ctx is not None: - await current_ctx.close() - current_ctx = None + async def _close_all() -> None: + for ctx in list(ctx_map.values()): + await ctx.close() + ctx_map.clear() try: async for event in events: @@ -62,50 +76,64 @@ async def _close_current() -> None: await tracer.handle(signal) if isinstance(event, StreamTaskMessageStart): - ctype = getattr(event.content, "type", None) - if ctype in ("text", "reasoning"): - await _close_current() - ctx = streaming.streaming_task_message_context( - task_id=task_id, - initial_content=event.content, - ) - current_ctx = await ctx.__aenter__() + if event.index is None: + continue + i = event.index + # Reset final_text_parts when a new text segment starts + if isinstance(event.content, TextContent): + final_text_parts = [] + ctx = streaming.streaming_task_message_context( + task_id=task_id, + initial_content=event.content, + created_at=created_at, + ) + ctx_map[i] = await ctx.__aenter__() elif isinstance(event, StreamTaskMessageDelta): - if current_ctx is not None and event.delta is not None: + if event.index is None: + continue + ctx = ctx_map.get(event.index) + if ctx is not None and event.delta is not None: # Reconstruct the delta with parent_task_message set from # the context's task_message (mirrors _langgraph_async.py # lines 72-78 and 117-127). delta_with_parent = StreamTaskMessageDelta( - parent_task_message=current_ctx.task_message, + parent_task_message=ctx.task_message, delta=event.delta, type="delta", index=event.index, ) - await current_ctx.stream_update(delta_with_parent) + await ctx.stream_update(delta_with_parent) if isinstance(event.delta, TextDelta) and event.delta.text_delta: final_text_parts.append(event.delta.text_delta) elif isinstance(event, StreamTaskMessageDone): - await _close_current() + if event.index is None: + continue + ctx = ctx_map.pop(event.index, None) + if ctx is not None: + await ctx.close() elif isinstance(event, StreamTaskMessageFull): - # Full messages (tool_request / tool_response): close any open - # streaming context first, then post the full message by opening - # a context with the content and closing it immediately - # (no deltas; StreamingTaskMessageContext.close() persists - # initial_content when the accumulator is empty). Use async with - # so the context is closed even if close() raises (__aexit__ - # delegates to close()). - await _close_current() + # Full messages: post the full message by opening a context + # with the content and closing it immediately (no deltas; + # StreamingTaskMessageContext.close() persists initial_content + # when the accumulator is empty). Use async with so the context + # is closed even if close() raises (__aexit__ delegates to + # close()). + # Full(TextContent) also resets final_text_parts for + # last-segment semantics. + if isinstance(event.content, TextContent): + final_text_parts = [event.content.content] async with streaming.streaming_task_message_context( task_id=task_id, initial_content=event.content, + created_at=created_at, ): pass finally: - await _close_current() + await _close_all() if deriver is not None and tracer is not None: for signal in deriver.flush(): await tracer.handle(signal) diff --git a/src/agentex/lib/core/harness/span_derivation.py b/src/agentex/lib/core/harness/span_derivation.py index d353cf9e0..503957582 100644 --- a/src/agentex/lib/core/harness/span_derivation.py +++ b/src/agentex/lib/core/harness/span_derivation.py @@ -105,7 +105,21 @@ def _on_delta(self, event: StreamTaskMessageDelta) -> list[SpanSignal]: return [] def _on_full(self, event: StreamTaskMessageFull) -> list[SpanSignal]: + """Handle a Full event. + + A `Full(ToolRequestContent)` opens a tool span (keyed by tool_call_id) + if it is not already open; the matching `Full(ToolResponseContent)` + closes it. This handles harnesses (e.g. LangGraph) that emit tool calls + as a single Full rather than Start+Done. + """ content = event.content + if isinstance(content, ToolRequestContent): + tcid = content.tool_call_id + if tcid not in self._open_tool_ids: + self._open_tool_ids[tcid] = None + args = dict(content.arguments or {}) + return [OpenSpan(key=tcid, kind="tool", name=content.name, input=args)] + return [] if isinstance(content, ToolResponseContent): tcid = content.tool_call_id if tcid in self._open_tool_ids: From dcd65b5507397bdbeee6daff6021927921a5d6aa Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 16:52:00 -0400 Subject: [PATCH 25/26] test(harness): add tests for AGX1-377 tool stream delivery, index routing, last-segment, created_at, Full ToolRequest spans test_auto_send.py: - Fix test 2: remove bare Start(ToolRequestContent) from events (old behavior was that Start did not open a ctx; new behavior does, so test was updated to use Full-only events that still verify the two-context behavior) - Extend _FakeStreaming to record created_at on each context call - Add test 6: streamed tool_request opens a ctx + routes deltas (AGX1-377 core) - Add test 7: interleaved indexes route deltas to correct per-index contexts - Add test 8: multi-step turns return the LAST text segment only - Add test 9: Full(TextContent) contributes its content to final_text - Add test 10: created_at is forwarded to every streaming context call (AGX1-378) test_span_derivation.py: - Add test_full_tool_request_opens_span: Full(ToolRequestContent) opens a span - Add test_full_tool_request_and_response_paired: paired Full request+response produces a complete OpenSpan+CloseSpan - Add test_full_tool_request_does_not_double_open: idempotent; a Full for an already-open tool_call_id is a no-op Co-Authored-By: Claude Sonnet 4.6 --- tests/lib/core/harness/test_auto_send.py | 215 ++++++++++++++++-- .../lib/core/harness/test_span_derivation.py | 89 ++++++++ 2 files changed, 287 insertions(+), 17 deletions(-) diff --git a/tests/lib/core/harness/test_auto_send.py b/tests/lib/core/harness/test_auto_send.py index e7331e67c..1948e9196 100644 --- a/tests/lib/core/harness/test_auto_send.py +++ b/tests/lib/core/harness/test_auto_send.py @@ -10,6 +10,7 @@ """ import types as _types +from datetime import datetime import pytest @@ -17,6 +18,7 @@ from agentex.types.text_content import TextContent from agentex.lib.core.harness.tracer import SpanTracer from agentex.types.task_message_delta import TextDelta +from agentex.types.tool_request_delta import ToolRequestDelta from agentex.types.task_message_update import ( StreamTaskMessageDone, StreamTaskMessageFull, @@ -64,10 +66,12 @@ class _FakeStreaming: def __init__(self): self.sink = [] + self.recorded_created_at: list[datetime | None] = [] def streaming_task_message_context(self, task_id, initial_content, streaming_mode="coalesced", created_at=None): ctype = getattr(initial_content, "type", None) self.sink.append(("ctx", ctype)) + self.recorded_created_at.append(created_at) return _FakeCtx(self.sink, ctype, initial_content) @@ -127,22 +131,10 @@ async def test_auto_send_streams_text_and_returns_final_text(): async def test_auto_send_posts_full_tool_messages(): streaming = _FakeStreaming() events = [ - # A bare tool_request Start (no Done/Full) must NOT open a streaming - # context on its own — only Full events post messages. - StreamTaskMessageStart( - type="start", - index=0, - content=ToolRequestContent( - type="tool_request", - author="agent", - tool_call_id="c0", - name="Bash", - arguments={}, - ), - ), + # Two Full events post two messages (open+close immediately, no deltas). StreamTaskMessageFull( type="full", - index=1, + index=0, content=ToolRequestContent( type="tool_request", author="agent", @@ -153,7 +145,7 @@ async def test_auto_send_posts_full_tool_messages(): ), StreamTaskMessageFull( type="full", - index=2, + index=1, content=ToolResponseContent( type="tool_response", author="agent", @@ -167,8 +159,7 @@ async def test_auto_send_posts_full_tool_messages(): assert result.final_text == "" - # The opened contexts correspond ONLY to the two Full events — the - # tool_request Start did not open a context. + # Each Full event opens and closes exactly one context. ctx_events = [s for s in streaming.sink if s[0] == "ctx"] assert len(ctx_events) == 2 content_types = [s[1] for s in ctx_events] @@ -307,3 +298,193 @@ async def _exploding_gen(): # The text context that was opened mid-stream was closed by the finally block. assert ("open", "text") in [(s[0], s[1]) for s in streaming.sink] assert ("close", "text") in [(s[0], s[1]) for s in streaming.sink] + + +# --------------------------------------------------------------------------- +# Test 6: streamed tool_request delivered (AGX1-377 core) +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_auto_send_streams_tool_request(): + """A Start(ToolRequestContent) MUST open a streaming context (AGX1-377).""" + streaming = _FakeStreaming() + events = [ + StreamTaskMessageStart( + type="start", + index=0, + content=ToolRequestContent( + type="tool_request", + author="agent", + tool_call_id="c_tool", + name="Bash", + arguments={}, + ), + ), + StreamTaskMessageDelta( + type="delta", + index=0, + delta=ToolRequestDelta( + type="tool_request", + tool_call_id="c_tool", + name="Bash", + arguments_delta='{"cmd": "ls"}', + ), + ), + StreamTaskMessageDone(type="done", index=0), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + + assert result.final_text == "" + + ctx_events = [s for s in streaming.sink if s[0] == "ctx"] + assert len(ctx_events) == 1 + assert ctx_events[0][1] == "tool_request" + + opens = [s for s in streaming.sink if s[0] == "open"] + closes = [s for s in streaming.sink if s[0] == "close"] + assert len(opens) == 1 + assert len(closes) == 1 + + updates = [s for s in streaming.sink if s[0] == "update"] + assert len(updates) == 1 + + +# --------------------------------------------------------------------------- +# Test 7: interleaved indexes route correctly +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_auto_send_interleaved_indexes_route_correctly(): + """Deltas must be routed to the correct index-keyed context.""" + streaming = _FakeStreaming() + events = [ + StreamTaskMessageStart( + type="start", + index=0, + content=TextContent(type="text", author="agent", content=""), + ), + StreamTaskMessageStart( + type="start", + index=1, + content=TextContent(type="text", author="agent", content=""), + ), + StreamTaskMessageDelta( + type="delta", + index=0, + delta=TextDelta(type="text", text_delta="A"), + ), + StreamTaskMessageDelta( + type="delta", + index=1, + delta=TextDelta(type="text", text_delta="B"), + ), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageDone(type="done", index=1), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + + ctx_events = [s for s in streaming.sink if s[0] == "ctx"] + assert len(ctx_events) == 2 + + opens = [s for s in streaming.sink if s[0] == "open"] + assert len(opens) == 2 + + updates = [s for s in streaming.sink if s[0] == "update"] + assert len(updates) == 2 + + update_deltas = [s[1].delta for s in streaming.sink if s[0] == "update"] + text_deltas = [d.text_delta for d in update_deltas if isinstance(d, TextDelta)] + assert set(text_deltas) == {"A", "B"} + + +# --------------------------------------------------------------------------- +# Test 8: final_text returns last text segment for multi-step +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_auto_send_final_text_last_segment(): + """final_text must be the LAST text segment, not accumulated across all turns.""" + streaming = _FakeStreaming() + events = [ + StreamTaskMessageStart( + type="start", + index=0, + content=TextContent(type="text", author="agent", content=""), + ), + StreamTaskMessageDelta( + type="delta", + index=0, + delta=TextDelta(type="text", text_delta="First"), + ), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageStart( + type="start", + index=1, + content=TextContent(type="text", author="agent", content=""), + ), + StreamTaskMessageDelta( + type="delta", + index=1, + delta=TextDelta(type="text", text_delta="Second"), + ), + StreamTaskMessageDone(type="done", index=1), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + assert result.final_text == "Second" + + +# --------------------------------------------------------------------------- +# Test 9: Full(TextContent) contributes to final_text +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_auto_send_full_text_content_sets_final_text(): + """A Full(TextContent) must contribute its text to final_text.""" + streaming = _FakeStreaming() + events = [ + StreamTaskMessageFull( + type="full", + index=0, + content=TextContent(type="text", author="agent", content="hello"), + ), + ] + result = await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming) + assert result.final_text == "hello" + + +# --------------------------------------------------------------------------- +# Test 10: created_at is forwarded to streaming context (AGX1-378) +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_auto_send_created_at_forwarded(): + """created_at must be forwarded to every streaming_task_message_context call.""" + streaming = _FakeStreaming() + dt = datetime(2025, 1, 15, 12, 0, 0) + events = [ + StreamTaskMessageStart( + type="start", + index=0, + content=TextContent(type="text", author="agent", content=""), + ), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull( + type="full", + index=1, + content=ToolRequestContent( + type="tool_request", + author="agent", + tool_call_id="c_ts", + name="Bash", + arguments={}, + ), + ), + ] + await auto_send(_gen(events), task_id="task1", tracer=None, streaming=streaming, created_at=dt) + + assert all(ts == dt for ts in streaming.recorded_created_at) diff --git a/tests/lib/core/harness/test_span_derivation.py b/tests/lib/core/harness/test_span_derivation.py index 7779de815..f22b83d54 100644 --- a/tests/lib/core/harness/test_span_derivation.py +++ b/tests/lib/core/harness/test_span_derivation.py @@ -166,3 +166,92 @@ def test_orphan_tool_response_ignored(): ), ] assert _signals(d, events) == [] + + +def test_full_tool_request_opens_span(): + """Full(ToolRequestContent) must open a tool span (for LangGraph-style harnesses).""" + d = SpanDeriver() + events = [ + StreamTaskMessageFull( + type="full", + index=0, + content=ToolRequestContent( + type="tool_request", + author="agent", + tool_call_id="call_x", + name="Bash", + arguments={"cmd": "ls"}, + ), + ), + ] + sigs = _signals(d, events) + assert sigs[0] == OpenSpan(key="call_x", kind="tool", name="Bash", input={"cmd": "ls"}) + assert sigs[1] == CloseSpan(key="call_x", output=None, is_complete=False) + + +def test_full_tool_request_and_response_paired(): + """Full(ToolRequestContent) + Full(ToolResponseContent) produces a complete span pair.""" + d = SpanDeriver() + events = [ + StreamTaskMessageFull( + type="full", + index=0, + content=ToolRequestContent( + type="tool_request", + author="agent", + tool_call_id="call_y", + name="Grep", + arguments={}, + ), + ), + StreamTaskMessageFull( + type="full", + index=1, + content=ToolResponseContent( + type="tool_response", + author="agent", + tool_call_id="call_y", + name="Grep", + content="result", + ), + ), + ] + sigs = _signals(d, events) + assert sigs == [ + OpenSpan(key="call_y", kind="tool", name="Grep", input={}), + CloseSpan(key="call_y", output="result", is_complete=True), + ] + + +def test_full_tool_request_does_not_double_open(): + """A Full(ToolRequestContent) for an already-open tool_call_id is a no-op.""" + d = SpanDeriver() + events = [ + StreamTaskMessageStart( + type="start", + index=0, + content=ToolRequestContent( + type="tool_request", + author="agent", + tool_call_id="call_z", + name="X", + arguments={}, + ), + ), + StreamTaskMessageDone(type="done", index=0), + StreamTaskMessageFull( + type="full", + index=1, + content=ToolRequestContent( + type="tool_request", + author="agent", + tool_call_id="call_z", + name="X", + arguments={}, + ), + ), + ] + sigs = _signals(d, events) + opens = [s for s in sigs if isinstance(s, OpenSpan)] + assert len(opens) == 1 + assert opens[0].key == "call_z" From b4b8b33047e7e8fc30436bc95dfb81b53888682c Mon Sep 17 00:00:00 2001 From: Declan Brady Date: Thu, 18 Jun 2026 17:01:54 -0400 Subject: [PATCH 26/26] feat(harness): thread created_at through UnifiedEmitter.auto_send_turn (AGX1-378) So migration helpers can restore the deterministic first-message timestamp on the temporal path. Default None preserves current behavior. Co-Authored-By: Claude Opus 4.8 (1M context) --- src/agentex/lib/core/harness/emitter.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/agentex/lib/core/harness/emitter.py b/src/agentex/lib/core/harness/emitter.py index 681c859ea..85395fcff 100644 --- a/src/agentex/lib/core/harness/emitter.py +++ b/src/agentex/lib/core/harness/emitter.py @@ -3,6 +3,7 @@ from __future__ import annotations from typing import AsyncGenerator +from datetime import datetime from agentex.lib.core.harness.types import TurnResult, HarnessTurn, StreamTaskMessage from agentex.lib.core.harness.tracer import SpanTracer @@ -56,12 +57,18 @@ async def yield_turn(self, turn: HarnessTurn) -> AsyncGenerator[StreamTaskMessag async for event in yield_events(turn.events, tracer=self.tracer): yield event - async def auto_send_turn(self, turn: HarnessTurn) -> TurnResult: - """Async/temporal delivery: push to the task stream, return TurnResult.""" + async def auto_send_turn(self, turn: HarnessTurn, created_at: datetime | None = None) -> TurnResult: + """Async/temporal delivery: push to the task stream, return TurnResult. + + Pass `created_at` (e.g. `workflow.now()` under Temporal) to stamp the + turn's messages with a deterministic timestamp; it is forwarded to the + streaming contexts. Default None preserves server-side timestamps. + """ return await auto_send( turn.events, task_id=self.task_id, tracer=self.tracer, streaming=self._streaming, usage=turn.usage(), + created_at=created_at, )