fix(harness): assert cross-channel (yield vs auto-send) conformance equivalence [AGX1-373]#414
Open
declan-scale wants to merge 3 commits into
Conversation
d21c54a to
ebc468d
Compare
b4c53ca to
cae14d4
Compare
Contributor
Author
|
@greptile review |
…quivalence [AGX1-373] Rebased on the pyright-clean foundation. Includes @OverRide on _RecordingTracer.handle and relative conformance imports so the whole-repo pyright (scripts/lint) passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…arison
- Add `payload: str` field to LogicalDelivery (NamedTuple, default "").
- _yield_logical_deliveries: track TextDelta / ReasoningContentDelta
accumulation per-index; include "".join(deltas) as payload for text/
reasoning deliveries. Include json.dumps(arguments, sort_keys=True) as
payload for tool_request; str(content) for tool_response.
- _auto_send_logical_deliveries: collect ("update", delta) entries from
the _FakeCtx sink between open and close; extract TextDelta /
ReasoningContentDelta text and accumulate. Carry same tool payload
fields.
- _yield_text_reasoning_seq: forward payload through when re-keying
index → seq.
- All 35 harness tests pass; ruff + pyright clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ess streamed tool-request delivery, include initial_content in payload
- Remove the Start(tool_request)+Done suppression in _yield_logical_deliveries:
auto_send now delivers streamed tool-request messages (AGX1-377 fix), so both
channels emit a LogicalDelivery for a streamed tool_request. The cross-channel
assertion verifies delivery on both sides.
- Include StreamTaskMessageStart.content in payload comparison for text and
reasoning types: TextContent.content is prepended to accumulated deltas;
ReasoningContent.summary items are prepended. This catches a channel that
drops initial_content or reasoning summary (Greptile id 3438655533, P1).
_auto_send_logical_deliveries mirrors the same seeding from ctx initial_content.
- Add "streamed-tool-request" fixture (Start + Done, no Full) to confirm
delivery on both channels under the new auto_send behaviour.
- Update "streaming-text" fixture to use non-empty initial_content ("Init") so
the initial_content seeding is actually exercised by the test.
- Update module/docstring comments that referenced the AGX1-377 suppression.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8cd851c to
2e820c7
Compare
Contributor
Author
|
@greptile review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fast-follow on the unified harness surface foundation. Upgrades the conformance runner to actually assert cross-channel equivalence between
yield_eventsandauto_send, replacing the prior determinism-only test that merely ran the same deriver twice.Equivalence approach
Both channels are driven over each fixture using in-test fakes (mirroring patterns from
test_yield_delivery.pyandtest_auto_send.py). The results are normalised toLogicalDelivery(content_type, identity)tuples that strip the streaming-envelope difference:yieldchannel deliversStreamTaskMessageFull(ToolResponseContent)verbatim.auto_sendchannel delivers the same content by opening a streaming context withinitial_contentand closing it immediately (no deltas).Both collapse to
LogicalDelivery("tool_response", frozenset({("tool_call_id", ...), ("name", ...)}))and compare equal.Text/reasoning deliveries are normalised to sequential position within their type (since
auto_sendhas no event index in its streaming sink).Span signals are asserted identical: both channels call
SpanDeriver.observe()on the same event sequence, so the derived signals must match.Full-message decision: keep open+immediate-close
auto_sendretains the existing approach of posting aStreamTaskMessageFull(tool_request/tool_response) viastreaming_task_message_context(...).__aenter__()+ immediateclose(). Rationale:StreamingTaskMessageContext.close()persistsinitial_contentwhen the accumulator is empty, so the message is correctly written._langgraph_async.pypattern already in production.adk.messages.createwould require a new injectable dependency for no observable benefit.The envelope difference (Full vs Start+Done on the wire) is documented as an acceptable design choice in
runner.pyalongside the decision rationale.Fixtures
builtin-single-tool— retained (existing fixture, tool request+response cycle)streaming-text— new: text Start/delta/delta/Done pathreasoning-block— new: reasoning Start/delta/Done (exercises reasoning span open/close)Results
./scripts/test tests/lib/core/harness/— 35 passed on Python 3.12 and 3.13uv run pyright src/agentex/lib/core/harness/— 0 errors🤖 Generated with Claude Code
Greptile Summary
This fast-follow PR upgrades the conformance runner from a determinism-only check to a genuine cross-channel equivalence assertion between
yield_eventsandauto_send, directly addressing five previously-reported weaknesses in the prior implementation.LogicalDelivery(content_type, identity, payload)normalisation that compares actual delivered content (includinginitial_contentseeds, delta accumulation, and tool arguments/response values) across both channels for four fixtures covering text, reasoning, tool-request, and tool-response paths._RecordingTracer, which intercepts and records everySpanSignaleach channel's tracer actually receives, so a channel that skipsderiver.observe()for any event type is caught.Start(tool_request)+Donedeliveries and verifyingauto_sendnow delivers them, confirmed by the newstreamed-tool-requestfixture.Confidence Score: 5/5
Safe to merge — all five previously-flagged gaps are addressed and the two remaining observations are minor comment/coverage nits in test helper code.
Both changed files are test infrastructure only. The new RecordingTracer genuinely captures per-channel span signals, the LogicalDelivery payload field covers initial_content seeding and delta accumulation, and the tool-request suppression is correctly removed. No production code is touched and the conformance suite now runs 35 passing tests.
No files require special attention.
Important Files Changed
Sequence Diagram
%%{init: {'theme': 'neutral'}}%% sequenceDiagram participant T as test_cross_channel_equivalence participant R as run_cross_channel_conformance participant Y as yield_events participant RT_Y as _RecordingTracer (yield) participant A as auto_send participant RT_A as _RecordingTracer (auto) participant FS as _FakeStreaming T->>R: fixture R->>RT_Y: "_RecordingTracer(tracing=_FakeTracing())" R->>Y: "yield_events(_gen(fixture.events), tracer=RT_Y)" Y-->>RT_Y: handle(SpanSignal) per event RT_Y-->>R: received_signals → yield_spans Y-->>R: yield_out (events verbatim) R->>R: _yield_logical_deliveries(yield_out) R->>R: _yield_text_reasoning_seq(...) → yield_deliveries R->>RT_A: "_RecordingTracer(tracing=_FakeTracing())" R->>FS: _FakeStreaming() R->>A: "auto_send(_gen(fixture.events), tracer=RT_A, streaming=FS)" A-->>RT_A: handle(SpanSignal) per event RT_A-->>R: received_signals → auto_spans A-->>FS: streaming_task_message_context → ctx/open/update/close entries R->>R: _auto_send_logical_deliveries(FS.sink) → auto_deliveries R-->>T: yield_deliveries, auto_deliveries, yield_spans, auto_spans T->>T: "assert yield_deliveries == auto_deliveries" T->>T: "assert yield_spans == auto_spans"%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% sequenceDiagram participant T as test_cross_channel_equivalence participant R as run_cross_channel_conformance participant Y as yield_events participant RT_Y as _RecordingTracer (yield) participant A as auto_send participant RT_A as _RecordingTracer (auto) participant FS as _FakeStreaming T->>R: fixture R->>RT_Y: "_RecordingTracer(tracing=_FakeTracing())" R->>Y: "yield_events(_gen(fixture.events), tracer=RT_Y)" Y-->>RT_Y: handle(SpanSignal) per event RT_Y-->>R: received_signals → yield_spans Y-->>R: yield_out (events verbatim) R->>R: _yield_logical_deliveries(yield_out) R->>R: _yield_text_reasoning_seq(...) → yield_deliveries R->>RT_A: "_RecordingTracer(tracing=_FakeTracing())" R->>FS: _FakeStreaming() R->>A: "auto_send(_gen(fixture.events), tracer=RT_A, streaming=FS)" A-->>RT_A: handle(SpanSignal) per event RT_A-->>R: received_signals → auto_spans A-->>FS: streaming_task_message_context → ctx/open/update/close entries R->>R: _auto_send_logical_deliveries(FS.sink) → auto_deliveries R-->>T: yield_deliveries, auto_deliveries, yield_spans, auto_spans T->>T: "assert yield_deliveries == auto_deliveries" T->>T: "assert yield_spans == auto_spans"Reviews (7): Last reviewed commit: "test(harness): propagate AGX1-377/378 fi..." | Re-trigger Greptile