Summary
a2a/utils/telemetry.py's trace_function async wrapper (lines ~218-258 on v1.0.3) special-cases asyncio.CancelledError — records the exception event but does NOT call span.set_status(StatusCode.ERROR, …). Suggest extending that allowlist to include asyncio.QueueShutDown (and the a2a back-port shim AsyncQueueShutDown from a2a/server/events/event_queue.py).
Why
EventQueueSource.dequeue_event is decorated with @trace_class. On normal task end the producer shutdown()s the underlying queue, the consumer's pending dequeue_event() raises QueueShutDown, and trace_function's generic except Exception arm paints the span StatusCode.ERROR. The consumer (ActiveTaskEventConsumer.run, a2a/server/agent_execution/active_task.py:~134) is designed to catch QueueShutDown and log it at DEBUG — this is normal teardown, not an error.
Real-world impact: in a fleet of A2A agents using LangSmith OTel + Langfuse, ≥81 % of StatusCode.ERROR observations over a 7-day window were this single false alarm. Legitimate TaskNotFoundError / InvalidParamsError / VersionNotSupportedError errors are drowned in the noise.
Proposed change
Treat QueueShutDown / AsyncQueueShutDown the same way CancelledError is treated today (record exception, leave span status alone, re-raise). One small allowlist tuple in trace_function.
Workaround we're shipping in the meantime
Custom SpanProcessor that drops the span before it reaches the OTLP exporter when span.name is on a small allowlist (today just EventQueueSource.dequeue_event) AND the recorded exception is QueueShutDown-class. Lives in our gideon library (42-com/gideon src/gideon/observability.py); see https://github.com/42-com/gideon/pull/ for the implementation.
Happy to open a PR against a2a-python mirroring the CancelledError pattern if maintainers are receptive.
References
Summary
a2a/utils/telemetry.py'strace_functionasync wrapper (lines ~218-258 on v1.0.3) special-casesasyncio.CancelledError— records the exception event but does NOT callspan.set_status(StatusCode.ERROR, …). Suggest extending that allowlist to includeasyncio.QueueShutDown(and the a2a back-port shimAsyncQueueShutDownfroma2a/server/events/event_queue.py).Why
EventQueueSource.dequeue_eventis decorated with@trace_class. On normal task end the producershutdown()s the underlying queue, the consumer's pendingdequeue_event()raisesQueueShutDown, andtrace_function's genericexcept Exceptionarm paints the spanStatusCode.ERROR. The consumer (ActiveTaskEventConsumer.run,a2a/server/agent_execution/active_task.py:~134) is designed to catchQueueShutDownand log it at DEBUG — this is normal teardown, not an error.Real-world impact: in a fleet of A2A agents using LangSmith OTel + Langfuse, ≥81 % of
StatusCode.ERRORobservations over a 7-day window were this single false alarm. LegitimateTaskNotFoundError/InvalidParamsError/VersionNotSupportedErrorerrors are drowned in the noise.Proposed change
Treat
QueueShutDown/AsyncQueueShutDownthe same wayCancelledErroris treated today (record exception, leave span status alone, re-raise). One small allowlist tuple intrace_function.Workaround we're shipping in the meantime
Custom
SpanProcessorthat drops the span before it reaches the OTLP exporter whenspan.nameis on a small allowlist (today justEventQueueSource.dequeue_event) AND the recorded exception isQueueShutDown-class. Lives in our gideon library (42-com/gideonsrc/gideon/observability.py); see https://github.com/42-com/gideon/pull/ for the implementation.Happy to open a PR against
a2a-pythonmirroring theCancelledErrorpattern if maintainers are receptive.References
a2a/utils/telemetry.py+a2a/server/events/event_queue_v2.py+a2a/server/events/event_queue.py.QueueShutDownitems [Bug]: got "AttributeError: 'Queue' object has no attribute 'shutdown'" when running examples/helloworld and examples/langgraph #46 and PR fix: makeevent_consumertolerant to closed queues on py3.13 #407 are about Python 3.13asyncio.Queue.shutdownportability, not telemetry).