Add WebSocket connection timing and reuse metrics#5145
Add WebSocket connection timing and reuse metrics#5145theomonnom wants to merge 5 commits intomainfrom
Conversation
Add connection timing metrics to STT, TTS, and RealtimeModel to help debug slow transitions between tasks. This change adds: - ConnectionResult class to return timing metadata from ConnectionPool - get_with_timing() and connection_with_timing() methods on ConnectionPool - websocket_connection_time and websocket_connection_reused fields to: - STTMetrics - TTSMetrics - RealtimeModelMetrics The new fields distinguish between initial connection establishment time (for new connections) and pool acquisition time (for reused connections). Updated plugins: - OpenAI Realtime: tracks WebSocket connection time - Google Realtime: tracks WebSocket connection time - Deepgram STT: tracks WebSocket connection time - Deepgram TTS: tracks connection pool timing - Cartesia TTS: tracks connection pool timing - ElevenLabs TTS: tracks connection reuse - Google STT: tracks connection pool timing https://claude.ai/code/session_017ngdECrv92KhXdPjiTdiZc
| self._ws_connection_time = ( | ||
| connection._connect_time if connection._connect_time else total_time |
There was a problem hiding this comment.
🟡 ElevenLabs TTS reports stale connection time for reused connections
The ws_connection_time logic at line 419-421 uses connection._connect_time if connection._connect_time else total_time. Since _connect_time is set during connect() for ALL connections (at tts.py:607), it is always non-None for both new and reused connections. For reused connections (is_reused=True), this reports the original WebSocket handshake time (e.g., 200ms) instead of the near-zero pool acquisition time (total_time). The comment on line 417-418 explicitly states the intent is to use total_time for reused connections, but the code never reaches that branch. The fix should condition on is_reused.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
Instead of adding websocket_connection_time and websocket_connection_reused fields to STTMetrics, TTSMetrics, and RealtimeModelMetrics, set them as span attributes (lk.ws.connection_time, lk.ws.connection_reused) on the active OTEL span directly in each plugin. https://claude.ai/code/session_017ngdECrv92KhXdPjiTdiZc
- Add `record_ws_connection()` helper in trace_types.py to reduce duplication - Update log messages to clearly indicate "(new)" vs "(reused)" connections - Remove unused `from opentelemetry import trace` imports from plugins https://claude.ai/code/session_017ngdECrv92KhXdPjiTdiZc
…ments - Add `status` property to ConnectionResult returning "reused" or "new" - Remove verbose docstrings and field comments - Simplify plugin log messages to use conn_result.status https://claude.ai/code/session_017ngdECrv92KhXdPjiTdiZc
| self._tts.current_connection(), self._conn_options.timeout | ||
| ) | ||
| total_time = time.perf_counter() - start_time | ||
| ws_connection_time = connection._connect_time or total_time |
There was a problem hiding this comment.
🟡 or operator used instead of is not None for _connect_time fallback, incorrect for 0.0
On line 416, connection._connect_time or total_time uses Python's truthiness to decide between the original WS connect time and the fallback. Since _connect_time is float | None, if _connect_time were exactly 0.0, the or would incorrectly fall through to total_time. The correct pattern is connection._connect_time if connection._connect_time is not None else total_time. While 0.0 is practically impossible for a real WS handshake, this is a known anti-pattern with numeric types.
| ws_connection_time = connection._connect_time or total_time | |
| ws_connection_time = connection._connect_time if connection._connect_time is not None else total_time |
Was this helpful? React with 👍 or 👎 to provide feedback.
Each plugin now sets ATTR_WS_CONNECTION_TIME on the span directly instead of calling a helper function. No new APIs introduced. https://claude.ai/code/session_017ngdECrv92KhXdPjiTdiZc
Summary
This PR adds comprehensive WebSocket connection timing and reuse metrics across the LiveKit agents framework and plugins. It introduces a new
ConnectionResultdataclass to track connection acquisition time and whether connections were reused from a pool, and propagates this information through the metrics system.Key Changes
Core Framework
ConnectionResultdataclass to encapsulate connection objects with timing metadata (connect_timeandfrom_poolflag)get_with_timing(): ReturnsConnectionResultinstead of just the connectionconnection_with_timing(): Context manager variant that yieldsConnectionResultget()andconnection()methods now delegate to the timing variants for consistencySTTMetrics,TTSMetrics, andRealtimeModelMetricswith two new optional fields:websocket_connection_time: Time in seconds to establish/acquire the connectionwebsocket_connection_reused: Boolean indicating if connection was reused from poolPlugin Implementations
current_connection()returning tuple with reuse flag; tracks connection timing in_Connection.connect()_create_ws_conn()with debug logging_main_task()when establishing Gemini API connectionconnection()toconnection_with_timing()to capture metricsconnection_with_timing()for metrics capture_connect_ws()with debug loggingconnection_with_timing()for metrics captureBase Classes
_ws_connection_timeand_ws_connection_reusedattributes; propagate these to metrics in_metrics_monitor_task()and_emit_metrics()Implementation Details
time.perf_counter()for high-resolution measurementsfloat | None,bool | None) to maintain backward compatibilityConnectionResult.from_poolflagfrom_poolis alwaysFalsehttps://claude.ai/code/session_017ngdECrv92KhXdPjiTdiZc