feat(errors): clean error results, recovery hints, quieter logs#64
Merged
Conversation
why: Unlock fastmcp 3.3/3.4 server features adopted in follow-up commits: ToolError(log_level=...) for expected-failure log demotion (fastmcp#4036) and ToolResult(is_error=True) error results (fastmcp#4217). what: - pyproject: fastmcp floor >=3.2.4 -> >=3.4.0 (lock already at 3.4.0) - CHANGES: Dependencies entry covering passive wins of the bump (template resource title preservation, no exc_info on expected tool failures, MCP-compliant OTEL span attributes) note: floor stays at 3.4.0 for now — 3.4.1 (starlette CVE-2026-48710 floor) is inside the 3-day uv dependency cooldown until 2026-06-08; a follow-up will raise the floor once it clears.
…ToolError why: Every agent-correctable failure (unknown pane id, invalid arguments, tier denial) logged at ERROR with a traceback, drowning operator logs in noise that isn't actionable. fastmcp 3.3 added ToolError(log_level=...) support (fastmcp#4036). what: - Add ExpectedToolError(ToolError) defaulting log_level=WARNING in _utils.py - _map_exception_to_tool_error: expected libtmux failure branches return ExpectedToolError; TmuxCommandNotFound (broken operator environment) and the unexpected catch-all stay at ERROR - Sweep all direct validation/state raises in tool modules and SafetyMiddleware tier denial to ExpectedToolError - Doctests: traceback class names follow the raise sites - tests: parametrized level partition for the mapper; in-memory Client + caplog proof that fastmcp's server layer honors the WARNING level (fastmcp logger needs propagate=True under caplog)
…ultMiddleware why: fastmcp's stock ErrorHandlingMiddleware(transform_errors=True) funnels every tool failure through a -32603 catch-all, so agents received 'Internal error: Pane not found: %5' — the transform mangled every expected failure message on the wire. fastmcp 3.4 added ToolResult(is_error=True) (fastmcp#4217), letting the server shape error results directly. what: - ToolErrorResultMiddleware(ErrorHandlingMiddleware) intercepts tool-call exceptions and returns ToolResult(is_error=True) carrying the message exactly as raised plus structured meta: error_type (originating exception class via __cause__), expected (agent-correctable vs operator fault/bug), suggestion (recovery hint when present) - _log_error override honors FastMCPError.log_level — the stock implementation hardcoded logger.error, re-shouting failures the ExpectedToolError demotion had moved to WARNING - Inherited on_message keeps the MCP -32002 resource-not-found transform for non-tool messages - Ordering invariant documented in the server.py stack comment: conversion must stay outside Audit/Retry/Safety, which depend on exception semantics (audit catches to record outcome=error, retry matches LibTmuxException via __cause__) - ExpectedToolError grows an optional suggestion kwarg; the not-found mapper branches point agents at the discovery tools (list_sessions / list_windows / list_panes) - docs(topics/logging): document the fastmcp.errors channel and the WARNING-expected / ERROR-operator-fault level contract - tests: probe-server coverage for clean messages, meta payload, suggestion plumbing, and per-level logging; production stack-order assertions updated to the subclass
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #64 +/- ##
==========================================
+ Coverage 84.31% 84.59% +0.27%
==========================================
Files 42 42
Lines 2684 2758 +74
Branches 359 370 +11
==========================================
+ Hits 2263 2333 +70
- Misses 317 321 +4
Partials 104 104 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
why: Tighten the unreleased entries to product level now that the work has a pull request to link. what: - Trim upstream fastmcp PR numbers, OTEL attribute names, and fastmcp API mechanics from the Dependencies and What's new entries — that depth lives in the linked PR - Add the PR reference to both What's new entries - Lead What's new with the clean-error-results headline; point discovery-tool mentions through tooliconl roles
…_log_error why: AGENTS.md logging standard prescribes lazy %-formatting over f-strings in log calls — deferred interpolation when the level is filtered, and aggregator message-template grouping. The new _log_error pre-formatted its message with an f-string; flagged in the PR code review. what: - Pass the template and args to logger.log directly; the stock include_traceback if/else collapses into exc_info=<bool> - tests: convert the log-level test to the NamedTuple + test_id parametrize pattern and assert via record.getMessage(), which both interpolates lazily-formatted records and would fail on a literal %s leaking into the message
why: The module docstring listed ToolErrorResultMiddleware after ReadonlyRetryMiddleware, but the class is defined directly after SafetyMiddleware — a reader scanning the file by the docstring's order lands in the wrong place. Flagged in the PR code review. what: - Reorder the bullet list to match file definition order and say so in the lead-in (Safety, ToolErrorResult, Audit, ReadonlyRetry, TailPreserving)
why: The handle_tool_errors docstrings still said failures re-raise as plain ToolError; the mapper now classifies expected failures as ExpectedToolError at WARNING — the docstrings were silent on the level partition and on why the re-raise chains from e. Flagged in the PR code review. what: - Document the ExpectedToolError(WARNING) / ToolError(ERROR) partition on both decorators - State the single-level __cause__ contract at the raise site: ReadonlyRetryMiddleware inspects exactly one cause hop, so re-wrapping the mapped error would silently disable readonly retries
…-layer errors why: The request-flow diagram showed only SafetyMiddleware and the error-handling section described a single decorator-to-ToolError step; the branch's error redesign (classification in the decorator, conversion to is_error results in ToolErrorResultMiddleware) made both descriptions materially incomplete. what: - Request flow: list all six middleware outermost-first with a one-phrase role each; point at the server.py stack comment for the ordering rationale - Error handling: describe the classification/conversion split, the from-e cause chain retries depend on, and the exceptions-stay- exceptions invariant through the audit/retry/safety trio; link topics/logging for the level policy - Source layout: middleware.py one-liner covers all middleware, not just the safety tier
…se sites why: Seven docstrings still named ToolError where every raise site now raises ExpectedToolError — true only by subclassing, and the rendered API reference loses the expected/unexpected distinction the error redesign introduced. what: - Raises sections: _coerce_dict_arg and _apply_filters (_utils), _validate_channel_name and wait_for_channel (wait_for_tools), find_pane_by_position (pane_tools/lifecycle) - Prose: _raise_if_pane_lifecycle_changed one-liner (pane_tools/state), wait_for_content_change baseline-loss note (pane_tools/wait)
…nvariant why: The ordering-invariant comment opened with 'All three depend on exception semantics' but explained only audit and retry and concluded 'silently disable both' — an internally inconsistent count in the one comment future maintainers rely on to not reorder the stack. Flagged in the PR code review. what: - Name safety's dependency explicitly (tier denials must propagate as exceptions for audit to record them) and close with 'break all three' at both sites: the ToolErrorResultMiddleware docstring and the server.py stack comment
…on claim why: The error-handling section said the middleware catches the exception 'at the outer edge of the stack', but Timing and the response limiter sit outside it — imprecise in the one page whose purpose is stack-position accuracy. Flagged in the PR code review. what: - Describe the catch point as 'once it has cleared the audit/retry/safety trio', matching the closing sentence's 'outermost error layer' framing
why: The class documents its two keyword parameters only via prose and doctests; AGENTS.md mandates NumPy docstring style, and the codebase's parameterised classes (SafetyMiddleware, AuditMiddleware) all carry a Parameters section. Flagged in the PR code review. what: - Document *args, log_level, and suggestion in a NumPy Parameters block between the extended description and Examples
… expected why: fastmcp validates tool arguments against the input schema before tool code runs, raising a bare pydantic.ValidationError that carries no log_level — the handle_tool_errors decorators never get a chance to classify it. The middleware logged these at ERROR and marked them expected=false, contradicting the documented policy that invalid arguments are agent-correctable WARNINGs (fastmcp's own server layer already logs the same failure as a WARNING). Reported in PR review. what: - _is_schema_validation_error: recognize pydantic.ValidationError on the error or its __cause__; output-shape failures cannot be confused with this case — fastmcp's tool layer converts those to error results before the middleware sees them (verified) - _log_error: exceptions without log_level classify via the helper — WARNING for argument validation, ERROR otherwise - _error_tool_result: expected=True for schema validation failures - tests: takes_int probe tool; ErrorLogLevelFixture grows an arguments field with a schema_validation_logs_warning case; meta classification test asserting expected=True + ValidationError
…tion why: The limiter sits outside ToolErrorResultMiddleware, so oversized error results (e.g. a validation message echoing a huge argument on a response-limited tool) hit the truncation path, which rebuilds the result without the error flag. The truncated error became an apparent success, and the MCP SDK client rejected it against the tool's output schema — the agent received a transport-level RuntimeError instead of the tool error. Reported in PR review; reproduced against the reference client. what: - TailPreservingResponseLimitingMiddleware.on_call_tool captures the inner result and restores is_error=True when the base class truncation replaced an error result; meta already survives via the base class, and tail-preservation keeps the suggestion line at the end of the text - Truncated successes never gain the flag - tests: LimiterErrorFixture parametrize (oversized error keeps flag + suggestion tail, small error untouched, oversized success stays non-error); the failing probe returns a Pydantic model so the MCP SDK client's output-schema validation guards the regression note: truncated SUCCESS results from schema'd tools still trip the same client-side validation — a pre-existing gap in fastmcp's stock limiter (its meta workaround only bypasses server-side validation), out of scope here
…allback
why: The previous commit's verification chain piped mypy through
tail, masking a real arg-type error — the capture callback didn't
satisfy fastmcp's CallNext protocol, which matches the parameter
name ('context'), not just the callable shape.
what:
- Annotate _capture with MiddlewareContext[CallToolRequestParams] ->
ToolResult and name the parameter 'context' to structurally match
the protocol; cast the captured Any back to ToolResult
why: The branch now preserves error results through the outer response limiter, so stale ordering prose made the documented middleware contract misleading. what: - Describe response limiting as return-path truncation after error conversion - Split tool exception classification from FastMCP schema validation handling - Document that truncated oversized errors preserve is_error and _meta
why: The full suite can capture snapshot output before a burst of queued shell commands finishes, making the truncation assertions depend on tmux timing. what: - Reuse the wait-for helper for snapshot output setup - Wait for shell command completion before calling snapshot_pane
why: The branch distinguishes agent-correctable failures from operator faults, so stale ToolError wording obscured the intended client contract. what: - Name ExpectedToolError in internal docstrings and middleware comments - Use expected tool error language in client-facing tool docs
…idation errors why: Gemini CLI's scheduler leaks its wait_for_previous sequencing flag into MCP tool arguments when batching calls in one turn (near-constant in gemini -p runs). Strict schemas reject it and Gemini self-recovers from the generic pydantic message, but the recovery depends on the model inferring what to do. MemPalace fixed the same leak by stripping the key (MemPalace/mempalace#322) and later whitelisting arguments (#647) — silent dropping would mask argument typos on mutating/destructive tools, so the rejection stays; the error just explains itself now. what: - _unexpected_kwargs: names flagged unexpected_keyword_argument in a pydantic validation failure (error or its __cause__) - _client_label: client 'name version' from the MCP initialize handshake (fastmcp_context.session.client_params.clientInfo); None-safe at every hop, wording-only — never gates behavior - _error_tool_result(error, context=None): synthesizes a suggestion for unexpected-argument rejections — name the argument(s) to drop or fix, plus a client-scheduling-flag note for wait_for_previous that names the connected client when the handshake exposed it - docs(topics/gotchas): the leak, batching-vs-mode findings, the self-recovery flow, and why rejection beats stripping here - tests: parametrized suggestion coverage (client-labeled flag leak, default handshake, plain typo, missing-arg silence) plus a direct no-handshake unit case via pydantic.validate_call
why: Prose said 'raises an expected tool error' in plain words while
the API reference renders ExpectedToolError with a stable autodoc
anchor — the docs convention is to cite autodoc'd APIs via roles,
never plain backticks, and readers get a one-click path to the
exception's full contract (log level, suggestion field).
what:
- Convert seven prose mentions to {exc}`~libtmux_mcp._utils.ExpectedToolError`:
wait-for-channel (crashed signaller), show-hook (unknown hook
names), delete-buffer (paste after delete), safety (kill-tool
pane_id requirement), wait-for-content-change (pane lifecycle),
capture-since (cursor/pane mismatch and pane lifecycle)
- architecture: switch the existing {class} role for
ExpectedToolError to {exc}, matching the LibTmuxException
precedent in topics/logging
- Bare fastmcp ToolError literals stay as code text: fastmcp
publishes no objects.inv, so there is no link target
why: The unrecognized-argument suggestion (with client detection) is user-visible product surface the unreleased notes didn't cover — the behavior a Gemini CLI user upgrading will actually notice. what: - What's new entry covering the do-this-next hint for rejected unknown arguments and the wait_for_previous client-flag note
why: The branch now synthesizes recovery suggestions for schema-validation
failures, including unknown arguments and client scheduling flags, so the
architecture page's discovery-only wording was too narrow. The Gemini
gotcha page also carried brittle current-state claims about client behavior
and external MCP server fixes.
what:
- Broaden error-result suggestion prose to cover recovery hints generally
- Keep ExpectedToolError references linked with {exc}
- Soften stale-prone Gemini and external-server claims
why: Middleware tests still documented safety denials and decorated libtmux failures with the older ToolError-only mental model, while production now uses ExpectedToolError for agent-correctable failures. what: - Update stack-order and audit-denial docstrings for ExpectedToolError - Make the audit-denial simulation match current SafetyMiddleware behavior - Clarify retry/error-probe comments around mapped expected tool errors
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
>=3.4.0, unlockingToolError(log_level=...)(3.3) andToolResult(is_error=True)(3.4)-32603transform, so agents receivedInternal error: Pane not found: %5— messages now arrive exactly as raised_metaon error results:error_type(originating exception class),expected(agent-correctable vs operator fault),suggestion(recovery hint when present)list_sessions/list_windows/list_panes) that resolve stale or guessed idswait_for_previous— a scheduling flag Gemini CLI leaks into batched tool calls — with the connected client's identity from the MCP initialize handshakeChanges by area
Error classification
src/libtmux_mcp/_utils.py:ExpectedToolError(ToolError)defaultslog_level=WARNINGand carries an optionalsuggestion._map_exception_to_tool_errorreturns it for the expected libtmux branches and attaches discovery-tool suggestions on the not-found branches. Direct validation raises across the tool modules and theSafetyMiddlewaretier denial use it.src/libtmux_mcp/middleware.py:_is_schema_validation_errorclassifies fastmcp's pre-dispatch pydantic argument validation as expected/WARNING — it fires before the decorators can classify it, and output-shape failures can't be confused with it (fastmcp's tool layer converts those to error results itself).Error results
src/libtmux_mcp/middleware.py:ToolErrorResultMiddleware(ErrorHandlingMiddleware)converts tool-call exceptions intoToolResult(is_error=True)with the clean message plus the_metapayload, and logs at the error's ownlog_level. The inheritedon_messagekeeps the MCP-32002resource-not-found transform for non-tool messages.src/libtmux_mcp/middleware.py:_unexpected_kwargs+_client_labelsynthesize the unrecognized-argument suggestion; the client label comes fromfastmcp_context.session.client_params.clientInfo, isNone-safe at every hop, and only words the message — it never gates behavior.src/libtmux_mcp/middleware.py:TailPreservingResponseLimitingMiddlewarepreservesis_errorthrough truncation — the stock path rebuilds the result without it, turning an oversized error into an apparent success that the MCP SDK client then rejects against the tool's output schema.src/libtmux_mcp/server.py: the middleware stack swaps the stock error middleware for the subclass; the ordering comment documents the invariant that error→result conversion must stay outside the audit/retry/safety trio.Docs
docs/topics/logging.md: thefastmcp.errorschannel and the WARNING-expected / ERROR-operator-fault level contract.docs/topics/architecture.md: full middleware request flow and the three-boundary error design (tool classification, schema classification, conversion).docs/topics/gotchas.md: Gemini CLI'swait_for_previousinjection — symptom, self-recovery flow, and why rejection beats stripping.ExpectedToolErrorentry via{exc}roles.Design decisions
RetryMiddlewarematchesLibTmuxExceptionvia__cause__) and make the audit log record failures asoutcome=ok.structured_contenton error results: tools declare output schemas for their success payloads and clients validatestructuredContentagainst them — an error-shaped payload there would fail validation on strict clients. Details ride in_metainstead.wait_for_previous). Unknown arguments are rejected, never silently stripped — MemPalace patched the same Gemini leak by popping the key (fix: Ignore wait_for_previous argument to support Gemini MCP clients MemPalace/mempalace#322) and later whitelisting against the schema, but on a server with mutating/destructive tools silent dropping would let a typo'd flag (enteronsend_keys) run with defaults.Before / After
Agent-visible result for a tool denied by the safety tier:
Before:
After:
with
_meta: {"error_type": "ExpectedToolError", "expected": true}.Verification
Every direct raise in the tool modules is tier-classified — stock
ToolErrorsurvives only as the ERROR-tier return inside the exception mapper:Test plan
ExpectedToolErrorat WARNING; tmux-missing and unexpected exceptions stay at ERROR —test_map_exception_*error_type/expected/suggestionmeta —test_tool_error_result_*test_map_exception_suggestion_policyexpected: trueand log at WARNING —test_schema_validation_failure_marked_expected_in_meta,test_tool_error_result_logs_at_error_log_level[schema_validation_logs_warning]wait_for_previousnote, plain-typo listing, missing-arg silence, no-handshake fallback —test_unexpected_argument_suggestion*is_errorthrough the response limiter; truncated successes never gain it —test_response_limiter_preserves_error_results__cause__contract) —test_readonly_retry_recovers_from_libtmux_exceptionoutcome=error—test_audit_records_safety_denialscripts/mcp_swap.pyacross gemini, codex, cursor-agent, and claude on an isolated tmux socket: all four received the clean error text and suggestion; gemini's organicwait_for_previousleak reproduced and self-recovered, and the client label resolved live asgemini-cli-mcp-client 0.45.2ruff check,ruff format,mypy,pytest --reruns 0,just build-docs