environmentd: add Prometheus metrics for MCP endpoints DEX-2#36535
Draft
jubrad wants to merge 1 commit into
Draft
environmentd: add Prometheus metrics for MCP endpoints DEX-2#36535jubrad wants to merge 1 commit into
jubrad wants to merge 1 commit into
Conversation
Adds a `McpMetrics` struct tracking five time series at the JSON-RPC
protocol layer, complementing the existing HTTP-level `PrometheusLayer`
metrics:
- `mcp_requests_total{endpoint, method, status}` — per JSON-RPC method
(initialize / tools/list / tools/call) and outcome
- `mcp_tool_calls_total{endpoint, tool, status}` — per tool and outcome
- `mcp_tool_duration_seconds{endpoint, tool}` — tool execution latency
histogram
- `mcp_errors_total{endpoint, error_type}` — error breakdown by type
(ValidationError, ExecutionError, ResponseSizeExceeded, etc.)
- `mcp_timeouts_total{endpoint}` — requests that hit the 60 s timeout
The timeout arm now increments `requests_total` and `errors_total`
immediately (not delayed until the background task finishes) by
capturing the method name before the spawn.
Also replaces the inline `QueryExecutionFailed` string in
`format_rows_response` with a dedicated `ResponseSizeExceeded` error
variant so size-limit hits are distinguishable in `errors_total`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
McpMetricsstruct that tracks MCP traffic at the JSON-RPC protocol layer, complementing the existing HTTP-levelPrometheusLayermetrics (which already expose request counts/durations keyed on/api/mcp/agentand/api/mcp/developerpaths).Five new time series, all labelled by
endpoint(agentordeveloper):mcp_requests_totalendpoint,method,statusmcp_tool_calls_totalendpoint,tool,statusmcp_tool_duration_secondsendpoint,toolmcp_errors_totalendpoint,error_typemcp_timeouts_totalendpointTimeout counting — the timeout arm now increments
requests_totalanderrors_totalimmediately by capturingmethod_namebeforerequestis moved into the spawned task. Previously, those counters would only be updated when the background task eventually finished, which could lag by up to 60 s during timeout storms.ResponseSizeExceedederror variant — replaces the inlineQueryExecutionFailedstring informat_rows_responsewith a dedicated variant so size-limit hits appear aserror_type="ResponseSizeExceeded"inmcp_errors_totalrather than"ExecutionError".Tests
test_mcp_error_codesto assert thatResponseSizeExceededmaps toINTERNAL_ERRORerror code and"ResponseSizeExceeded"error type string.test_format_rows_response_errors_when_over_limitcontinues to pass unchanged (the error message text is preserved by the new variant's#[error]attribute).