|
| 1 | +# Unified Harness Surface — PR 4: pydantic-ai Migration Plan |
| 2 | + |
| 3 | +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. |
| 4 | +
|
| 5 | +**Goal:** Migrate the pydantic-ai harness onto the unified harness surface so it emits streaming + persisted messages + tracing + turn usage through ONE source of truth, over both delivery channels (yield + auto-send), with no public regression — and ship its 3 integration test agents (sync/async/temporal). |
| 6 | + |
| 7 | +**Architecture:** Wrap a pydantic-ai run as a `HarnessTurn` (canonical `StreamTaskMessage*` stream + normalized `TurnUsage`). Reuse the existing `convert_pydantic_ai_to_agentex_events` mapping as the tap. Reimplement the existing public auto-send helper on top of `UnifiedEmitter.auto_send_turn`, and route sync ACP agents through `UnifiedEmitter.yield_turn`. Retire the bespoke `_pydantic_ai_tracing` handler in favor of the surface's derived spans (keep the old symbol as a deprecated shim). |
| 8 | + |
| 9 | +**Tech Stack:** Python 3, pydantic-ai (`pydantic_ai`), pydantic v2, pytest + pytest-asyncio, the `agentex.lib.core.harness` package from PRs 1-3. |
| 10 | + |
| 11 | +**Foundation:** `src/agentex/lib/core/harness/` (`UnifiedEmitter`, `SpanTracer`, `SpanDeriver`, `HarnessTurn`, `TurnUsage`, `TurnResult`, `yield_events`, `auto_send`, conformance scaffold). Design: `docs/superpowers/specs/2026-06-18-unified-harness-surface-design.md`. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Dependencies (must land first) |
| 16 | + |
| 17 | +- **AGX1-373** — cross-channel conformance equivalence + `Full` wire reconciliation. PR 4's conformance fixtures register into the upgraded cross-channel runner. **Do not start Task 6 until 373 is merged into the foundation branch.** |
| 18 | +- **AGX1-375** — public `adk` import path for the harness surface. If merged, import the surface via the public path in this PR; if not, import from `agentex.lib.core.harness` and add a follow-up note. (Tasks below assume `from agentex.lib.core.harness import UnifiedEmitter, TurnUsage, ...`; swap to the public path if 375 landed.) |
| 19 | + |
| 20 | +This is one PR (target < 1000 lines code, excluding any recorded fixtures). The 3 test agents are the largest chunk; if the diff exceeds budget, split the test agents into a follow-up PR 4b (note in the PR description). |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## File Structure |
| 25 | + |
| 26 | +- Modify `src/agentex/lib/adk/_modules/_pydantic_ai_sync.py` — add an optional `on_result` callback to `convert_pydantic_ai_to_agentex_events` (additive) so usage can be captured. Behavior unchanged when omitted. |
| 27 | +- Create `src/agentex/lib/adk/_modules/_pydantic_ai_turn.py` — `PydanticAITurn(HarnessTurn)` + `pydantic_ai_usage_to_turn_usage(...)`. |
| 28 | +- Modify `src/agentex/lib/adk/_modules/_pydantic_ai_async.py` — reimplement `stream_pydantic_ai_events` on `UnifiedEmitter.auto_send_turn`, preserving signature + return. |
| 29 | +- Modify `src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py` — mark `create_pydantic_ai_tracing_handler` / `AgentexPydanticAITracingHandler` deprecated (docstring + `DeprecationWarning`); keep importable. |
| 30 | +- Create `tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py` — register pydantic-ai fixtures into the cross-channel conformance runner. |
| 31 | +- Create `examples/tutorials/harness-pydantic-ai-{sync,async,temporal}/` — 3 test agents (modeled on the `sync-pydantic-ai` / `default-pydantic-ai` / `temporal-pydantic-ai` CLI templates) using the unified surface. |
| 32 | +- Modify `.github/workflows/harness-integration.yml` — enable the pydantic-ai rows of the `live-matrix` job. |
| 33 | +- Modify `.github/workflows/agentex-tutorials-test.yml` (or its agent list) — include the 3 new test agents if that workflow enumerates agents. |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## Task 1: Expose the pydantic-ai run result for usage capture |
| 38 | + |
| 39 | +**Files:** |
| 40 | +- Modify: `src/agentex/lib/adk/_modules/_pydantic_ai_sync.py` |
| 41 | +- Test: `tests/lib/adk/test_pydantic_ai_sync.py` (create if absent) |
| 42 | + |
| 43 | +The converter already iterates the pydantic-ai event stream and currently *ignores* `AgentRunResultEvent` (the terminal event carrying the run result + usage). Add an optional callback so a caller can capture it without changing existing behavior. |
| 44 | + |
| 45 | +- [ ] **Step 1: Write the failing test.** |
| 46 | + |
| 47 | +```python |
| 48 | +import pytest |
| 49 | +from agentex.lib.adk._modules._pydantic_ai_sync import convert_pydantic_ai_to_agentex_events |
| 50 | + |
| 51 | + |
| 52 | +class _FakeResultEvent: # stand-in for pydantic_ai.run.AgentRunResultEvent |
| 53 | + def __init__(self, result): |
| 54 | + self.result = result |
| 55 | + |
| 56 | + |
| 57 | +async def _stream(events): |
| 58 | + for e in events: |
| 59 | + yield e |
| 60 | + |
| 61 | + |
| 62 | +@pytest.mark.asyncio |
| 63 | +async def test_on_result_callback_receives_terminal_event(monkeypatch): |
| 64 | + # When the stream ends with an AgentRunResultEvent, on_result is invoked with it, |
| 65 | + # and the converter still yields no extra events for it. |
| 66 | + captured = {} |
| 67 | + # Use a real AgentRunResultEvent if constructable; otherwise patch isinstance check. |
| 68 | + # (Implementer: see Step 3 note — match the real terminal event type.) |
| 69 | + ... |
| 70 | +``` |
| 71 | + |
| 72 | +Implementer note: the exact terminal event type is `pydantic_ai.run.AgentRunResultEvent` (already imported in `_pydantic_ai_sync.py`). Write the test to feed a stream ending in a real `AgentRunResultEvent` (construct it as the installed pydantic-ai version requires; inspect `python -c "import pydantic_ai.run, inspect; print(inspect.signature(pydantic_ai.run.AgentRunResultEvent))"`). Assert `on_result` is called once with that event and that the converter yields the same `StreamTaskMessage*` sequence as without the callback (no behavior change for the streaming output). |
| 73 | + |
| 74 | +- [ ] **Step 2: Run** `uv run pytest tests/lib/adk/test_pydantic_ai_sync.py -v` — expect FAIL (no `on_result` param). |
| 75 | + |
| 76 | +- [ ] **Step 3: Implement.** Add `on_result: Callable[[AgentRunResultEvent], None] | None = None` (and an async-callable variant if needed) to `convert_pydantic_ai_to_agentex_events`. In the existing `elif isinstance(event, (FunctionToolCallEvent, FinalResultEvent, AgentRunResultEvent))` branch, when the event is an `AgentRunResultEvent` and `on_result` is set, call it (await if it's a coroutine). Keep yielding nothing for it. No other change. |
| 77 | + |
| 78 | +- [ ] **Step 4: Run** the test — expect PASS, plus run the existing `_pydantic_ai_sync` tests if any to confirm no regression. |
| 79 | + |
| 80 | +- [ ] **Step 5: Commit** `feat(pydantic-ai): optional on_result callback to expose run result for usage capture`. |
| 81 | + |
| 82 | +--- |
| 83 | + |
| 84 | +## Task 2: Normalize pydantic-ai usage to `TurnUsage` |
| 85 | + |
| 86 | +**Files:** |
| 87 | +- Create: `src/agentex/lib/adk/_modules/_pydantic_ai_turn.py` |
| 88 | +- Test: `tests/lib/adk/test_pydantic_ai_turn.py` |
| 89 | + |
| 90 | +- [ ] **Step 1: Verify the real usage shape FIRST.** Run `uv run python -c "from pydantic_ai.usage import RunUsage; import inspect; print([f for f in RunUsage.model_fields])"` (the type/name may be `RunUsage` or `Usage` depending on the installed version). Record the exact field names (commonly: `input_tokens`, `output_tokens`, `total_tokens`, `requests`, and a cache/`details` field). The mapping in Step 3 MUST use the real field names. |
| 91 | + |
| 92 | +- [ ] **Step 2: Write the failing test.** |
| 93 | + |
| 94 | +```python |
| 95 | +from agentex.lib.adk._modules._pydantic_ai_turn import pydantic_ai_usage_to_turn_usage |
| 96 | + |
| 97 | + |
| 98 | +def test_usage_normalization_maps_fields(): |
| 99 | + # Build a usage object matching the installed pydantic-ai RunUsage shape |
| 100 | + # (see Task 2 Step 1 for the real fields), then assert the mapping. |
| 101 | + usage_obj = ... # construct RunUsage(input_tokens=10, output_tokens=20, requests=2, ...) |
| 102 | + tu = pydantic_ai_usage_to_turn_usage(usage_obj, model="openai:gpt-4o") |
| 103 | + assert tu.model == "openai:gpt-4o" |
| 104 | + assert tu.input_tokens == 10 |
| 105 | + assert tu.output_tokens == 20 |
| 106 | + assert tu.num_llm_calls == 2 |
| 107 | +``` |
| 108 | + |
| 109 | +- [ ] **Step 3: Implement** `pydantic_ai_usage_to_turn_usage(usage, model) -> TurnUsage` mapping the verified RunUsage fields onto `TurnUsage` (`input_tokens`, `output_tokens`, `total_tokens`, `cached_input_tokens` if available, `num_llm_calls` ← `requests`). Use `getattr(usage, "<field>", None)` defensively so a version field rename degrades to `None` rather than crashing. Then implement `PydanticAITurn`: |
| 110 | + |
| 111 | +```python |
| 112 | +class PydanticAITurn: |
| 113 | + """A pydantic-ai run as a HarnessTurn: canonical event stream + normalized usage.""" |
| 114 | + |
| 115 | + def __init__(self, stream, model: str | None = None): |
| 116 | + self._stream = stream |
| 117 | + self._model = model |
| 118 | + self._usage = TurnUsage(model=model) |
| 119 | + |
| 120 | + @property |
| 121 | + async def events(self): |
| 122 | + def _capture(result_event): |
| 123 | + run_result = getattr(result_event, "result", None) |
| 124 | + usage_obj = run_result.usage() if run_result is not None else None |
| 125 | + if usage_obj is not None: |
| 126 | + self._usage = pydantic_ai_usage_to_turn_usage(usage_obj, self._model) |
| 127 | + async for ev in convert_pydantic_ai_to_agentex_events(self._stream, on_result=_capture): |
| 128 | + yield ev |
| 129 | + |
| 130 | + def usage(self) -> TurnUsage: |
| 131 | + return self._usage |
| 132 | +``` |
| 133 | + |
| 134 | +(Verify `run_result.usage()` is the correct accessor for the installed version; adjust if it's an attribute.) |
| 135 | + |
| 136 | +- [ ] **Step 4: Add a `PydanticAITurn` test** that feeds a small stream ending in an `AgentRunResultEvent` whose `result.usage()` returns a known usage, drives `turn.events` to exhaustion, then asserts `turn.usage()` reflects the normalized values and that `events` yielded the expected `StreamTaskMessage*`. Confirm `usage()` BEFORE exhaustion returns the default (documented single-pass contract). |
| 137 | + |
| 138 | +- [ ] **Step 5: Run** the tests — expect PASS. |
| 139 | + |
| 140 | +- [ ] **Step 6: Commit** `feat(pydantic-ai): PydanticAITurn HarnessTurn + usage normalization`. |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## Task 3: Reimplement the auto-send helper on the unified surface |
| 145 | + |
| 146 | +**Files:** |
| 147 | +- Modify: `src/agentex/lib/adk/_modules/_pydantic_ai_async.py` |
| 148 | +- Test: `tests/lib/adk/test_pydantic_ai_async.py` |
| 149 | + |
| 150 | +`stream_pydantic_ai_events(stream, task_id, ...)` currently hand-drives `adk.streaming`. Reimplement it to delegate to `UnifiedEmitter.auto_send_turn(PydanticAITurn(stream, model))`, preserving its signature and return value (the accumulated final text). Feature-add: traces by default. |
| 151 | + |
| 152 | +- [ ] **Step 1: Capture current behavior as a characterization test.** Before changing anything, write a test that runs the CURRENT `stream_pydantic_ai_events` over a fixture stream with a fake `adk.streaming` and records the messages produced (text, tool request/response). This is the backward-compat baseline ("equivalent messages before/after" from the design). |
| 153 | + |
| 154 | +- [ ] **Step 2: Run** it green against the current implementation. Commit the test alone: `test(pydantic-ai): characterize stream_pydantic_ai_events output`. |
| 155 | + |
| 156 | +- [ ] **Step 3: Reimplement** `stream_pydantic_ai_events` to build a `PydanticAITurn` and call `UnifiedEmitter(task_id=task_id, trace_id=<resolved>, parent_span_id=<resolved>, streaming=<injected or None>).auto_send_turn(turn)`, returning `result.final_text`. Resolve `trace_id`/`parent_span_id` the same way the module does today (from the streaming/tracing context vars it already reads). Preserve the exact public signature and return type. |
| 157 | + |
| 158 | +- [ ] **Step 4: Run** the characterization test — it must still pass (same messages). Adjust the test only if AGX1-373 deliberately changed the tool-message wire shape; in that case assert the post-373 shape and note it. Confirm tracing now occurs by default (assert spans via a fake tracer). |
| 159 | + |
| 160 | +- [ ] **Step 5: Commit** `refactor(pydantic-ai): reimplement stream_pydantic_ai_events on UnifiedEmitter (default tracing)`. |
| 161 | + |
| 162 | +--- |
| 163 | + |
| 164 | +## Task 4: Route sync ACP delivery through the surface + deprecate the bespoke tracing handler |
| 165 | + |
| 166 | +**Files:** |
| 167 | +- Modify: `src/agentex/lib/adk/_modules/_pydantic_ai_tracing.py` |
| 168 | +- (Reference) the sync ACP usage pattern in the pydantic-ai docs/templates. |
| 169 | + |
| 170 | +- [ ] **Step 1: Deprecate the bespoke tracing handler.** Add a `DeprecationWarning` (via `warnings.warn(...)`) and a docstring note to `create_pydantic_ai_tracing_handler` / `AgentexPydanticAITracingHandler` stating the unified surface (`UnifiedEmitter`, which derives spans from the canonical stream) supersedes it. Keep the symbols importable and functional (no removal — backward compat). |
| 171 | + |
| 172 | +- [ ] **Step 2: Confirm the sync path.** The sync tap remains `convert_pydantic_ai_to_agentex_events`. Document (in the module docstring of `_pydantic_ai_sync.py`) the recommended sync ACP usage: |
| 173 | + |
| 174 | +```python |
| 175 | +turn = PydanticAITurn(agent.run_stream_events(...), model=...) |
| 176 | +async for event in emitter.yield_turn(turn): |
| 177 | + yield event |
| 178 | +``` |
| 179 | + |
| 180 | +No code change beyond the docstring (the sync converter already yields the canonical stream; `yield_turn` adds tracing). Add a test that `emitter.yield_turn(PydanticAITurn(...))` forwards the same events the bare converter would and derives spans. |
| 181 | + |
| 182 | +- [ ] **Step 3: Run** tests; **Commit** `refactor(pydantic-ai): deprecate bespoke tracing handler; document unified sync path`. |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## Task 5: pydantic-ai cross-channel conformance fixtures |
| 187 | + |
| 188 | +**Files:** |
| 189 | +- Create: `tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py` |
| 190 | + |
| 191 | +**Blocked by AGX1-373** (the cross-channel conformance runner). Once 373 is merged into the foundation branch: |
| 192 | + |
| 193 | +- [ ] **Step 1: Record canonical fixtures.** For 3-4 representative pydantic-ai runs (text-only; single tool; reasoning/thinking; multi-step text+tool), capture the `StreamTaskMessage*` sequence the tap produces (run `convert_pydantic_ai_to_agentex_events` over recorded `AgentStreamEvent` inputs, or hand-author the canonical sequences). Store as `Fixture(name=..., events=[...])`. |
| 194 | + |
| 195 | +- [ ] **Step 2: Register** each fixture with the conformance runner and let the cross-channel parametrized test (from AGX1-373) assert yield-vs-auto-send equivalence + span equivalence for each. Register/parametrize within THIS module (per the runner's documented per-module registry semantics). |
| 196 | + |
| 197 | +- [ ] **Step 3: Run** `./scripts/test tests/lib/core/harness/ -v` — all green. **Commit** `test(pydantic-ai): cross-channel conformance fixtures`. |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +## Task 6: Three integration test agents (sync / async / temporal) |
| 202 | + |
| 203 | +**Files:** |
| 204 | +- Create: `examples/tutorials/harness-pydantic-ai-sync/` , `…-async/` , `…-temporal/` (each a minimal Agentex agent). |
| 205 | +- Modify: `.github/workflows/harness-integration.yml` (enable pydantic-ai `live-matrix` rows). |
| 206 | +- Modify: `.github/workflows/agentex-tutorials-test.yml` if it enumerates agents. |
| 207 | + |
| 208 | +Each agent is the smallest agent that exercises one delivery channel through the unified surface with the pydantic-ai harness. |
| 209 | + |
| 210 | +- [ ] **Step 1: Scaffold from the existing templates.** Base each agent on the corresponding CLI template: `sync-pydantic-ai`, `default-pydantic-ai` (async), `temporal-pydantic-ai` (under `src/agentex/lib/cli/templates/`). In each, the message handler builds `PydanticAITurn(agent.run_stream_events(params.content.content), model=...)` and: |
| 211 | + - sync agent: `async for ev in emitter.yield_turn(turn): yield ev` |
| 212 | + - async + temporal agents: `await emitter.auto_send_turn(turn)` (temporal: inside the activity, as the template already structures it). |
| 213 | + Use a tiny pydantic-ai agent with ONE trivial tool so the run exercises text + a tool call + tool response. |
| 214 | + |
| 215 | +- [ ] **Step 2: Write an integration test per agent** that drives it with a fixed prompt and asserts: valid ordered messages (text + tool request + tool response) and a well-formed span tree. Use the repo's existing tutorial-agent test harness pattern (see `agentex-tutorials-test.yml` and how current tutorial agents are tested). |
| 216 | + |
| 217 | +- [ ] **Step 3: Wire CI.** In `.github/workflows/harness-integration.yml`, replace the `if: false` placeholder `live-matrix` job (or add a real matrix) with the pydantic-ai × {sync, async, temporal} entries, each running its agent's integration test. If `agentex-tutorials-test.yml` enumerates agents, add the three there too. `log`/document any agent-type not covered (none expected for pydantic-ai). |
| 218 | + |
| 219 | +- [ ] **Step 4: Run** the integration tests locally (as far as the env allows) and the conformance + unit suites. **Commit** `test(pydantic-ai): sync/async/temporal integration agents + enable CI live-matrix rows`. |
| 220 | + |
| 221 | +--- |
| 222 | + |
| 223 | +## Task 7: Full suite, type check, and backward-compat audit |
| 224 | + |
| 225 | +- [ ] **Step 1:** `./scripts/test tests/lib/core/harness/ tests/lib/adk/ -v` — all green on 3.12 + 3.13. |
| 226 | +- [ ] **Step 2:** `uv run pyright src/agentex/lib/` (or the harness + pydantic modules) — 0 new errors. |
| 227 | +- [ ] **Step 3: Backward-compat audit.** Confirm the public signatures are unchanged: `convert_pydantic_ai_to_agentex_events` (only gained an optional kwarg), `stream_pydantic_ai_events` (same signature + return), `create_pydantic_ai_tracing_handler` (still importable, now warns). Grep the repo + templates for callers and confirm none broke. |
| 228 | +- [ ] **Step 4:** If any fix was needed, **Commit** `chore(pydantic-ai): type/back-compat fixes`. |
| 229 | + |
| 230 | +--- |
| 231 | + |
| 232 | +## Self-Review checklist (run before opening the PR) |
| 233 | + |
| 234 | +- Every public symbol that existed before still exists with the same signature (additive-only): `convert_pydantic_ai_to_agentex_events`, `stream_pydantic_ai_events`, `create_pydantic_ai_tracing_handler`. |
| 235 | +- The auto-send helper returns the same final text as before (characterization test passes, or the post-373 shape is asserted with a note). |
| 236 | +- Tracing is now on by default for both channels and is overridable (emitter `tracer=False`). |
| 237 | +- Usage normalization uses the REAL pydantic-ai usage field names (verified in Task 2 Step 1), with defensive `getattr`. |
| 238 | +- Conformance fixtures register per-module and pass the cross-channel assertion from AGX1-373. |
| 239 | +- 3 test agents exist and their CI rows are enabled. |
| 240 | +- No `# type: ignore` added without justification. |
| 241 | + |
| 242 | +## Notes for the PR description |
| 243 | + |
| 244 | +- Link AGX1-373 (dependency) and AGX1-375 (import path); note AGX1-374 (reasoning/mixed-ordering auto_send tests) is foundation-level and orthogonal. |
| 245 | +- State the diff size; if test agents pushed it over budget, note the PR 4b split. |
| 246 | +- This is the template the langgraph (PR 5) and openai (PR 6) migrations follow. |
0 commit comments