diff --git a/CHANGES b/CHANGES index e4e1dbe5..9f46d935 100644 --- a/CHANGES +++ b/CHANGES @@ -6,6 +6,24 @@ _Notes on upcoming releases will be added here_ +### Dependencies + +**Minimum `fastmcp>=3.4.0`** (was `>=3.2.4`). Enables the error-result and log-level improvements below; the bump alone also restores resource titles on the `tmux://` hierarchy and brings MCP-compliant telemetry span attributes. + +### What's new + +**Tool errors arrive clean, with structured detail and recovery hints** + +Failed tool calls now return their message exactly as raised — previously every failure was prefixed with `Internal error:`. Error results carry a structured `_meta` payload (`error_type`, `expected`, `suggestion`), and not-found errors point at {tooliconl}`list-sessions` / {tooliconl}`list-windows` / {tooliconl}`list-panes` to resolve stale or guessed ids. (#64) + +**Unknown tool arguments get a do-this-next hint** + +When a call includes arguments a tool doesn't accept, the error result now names exactly which argument(s) to remove or fix. When the stray key is `wait_for_previous` — a scheduling flag Gemini CLI can leak into batched tool calls — the hint says so and names the connected client, so agents recover in one read instead of re-deriving the problem from a validation traceback. (#64) + +**Expected tool failures log at WARNING, not ERROR** + +Agent-correctable failures — unknown ids, invalid arguments, safety-tier denials, transient tmux errors — now log at WARNING in the server's log stream. A missing tmux binary and unexpected exceptions stay at ERROR, so ERROR records are always worth attention. (#64) + ## libtmux-mcp 0.1.0a10 (2026-05-24) ### What's new diff --git a/docs/tools/buffer/delete-buffer.md b/docs/tools/buffer/delete-buffer.md index 5c105e32..d9cc8994 100644 --- a/docs/tools/buffer/delete-buffer.md +++ b/docs/tools/buffer/delete-buffer.md @@ -10,7 +10,7 @@ across MCP restarts. **Side effects:** Removes the named buffer from the tmux server. Subsequent {tooliconl}`paste-buffer` calls against the deleted name -return `ToolError`. +return an {exc}`~libtmux_mcp._utils.ExpectedToolError`. ```{fastmcp-tool-input} buffer_tools.delete_buffer ``` diff --git a/docs/tools/hook/show-hook.md b/docs/tools/hook/show-hook.md index 161609b5..33fa3af7 100644 --- a/docs/tools/hook/show-hook.md +++ b/docs/tools/hook/show-hook.md @@ -4,9 +4,10 @@ ``` **Use when** you know which hook you want to inspect by name. Returns -empty when the hook is unset; raises `ToolError` for unknown hook -names (typos, wrong scope) so input mistakes don't masquerade as -"nothing configured". +empty when the hook is unset; raises an +{exc}`~libtmux_mcp._utils.ExpectedToolError` for +unknown hook names (typos, wrong scope) so input mistakes don't +masquerade as "nothing configured". **Side effects:** None. Readonly. diff --git a/docs/tools/pane/capture-since.md b/docs/tools/pane/capture-since.md index 27e17ac1..018b43af 100644 --- a/docs/tools/pane/capture-since.md +++ b/docs/tools/pane/capture-since.md @@ -60,8 +60,9 @@ Read only content since that cursor: ``` The cursor carries the original pane id, so the follow-up call does not need -`pane_id`. If you pass both, they must match; a cursor for another pane raises a -tool error instead of silently reading the wrong process. +`pane_id`. If you pass both, they must match; a cursor for another pane raises +an {exc}`~libtmux_mcp._utils.ExpectedToolError` instead of silently reading the +wrong process. If nothing new was written after the cursor, `lines` is empty and the response still includes a fresh cursor for the same pane. If the cursor row scrolled into @@ -73,8 +74,8 @@ needed to compute an exact delta. In that case, `lines` is a conservative current visible capture and the response includes a fresh cursor. Pane lifecycle is part of the cursor contract. If the pane dies or is respawned, -the call raises a tool error instead of reading from a different process that -reused the same pane id. +the call raises an {exc}`~libtmux_mcp._utils.ExpectedToolError` instead of +reading from a different process that reused the same pane id. `truncated`, `truncated_lines`, and `truncated_bytes` are structured metadata. No truncation marker is injected into `lines`, so clients can display terminal diff --git a/docs/tools/pane/wait-for-channel.md b/docs/tools/pane/wait-for-channel.md index 1fbb7c98..b0574d8f 100644 --- a/docs/tools/pane/wait-for-channel.md +++ b/docs/tools/pane/wait-for-channel.md @@ -29,8 +29,9 @@ to poll for an output marker instead; state-polling is structurally safer than edge-triggered signalling for fragile commands. **Side effects:** Blocks the call up to `timeout` seconds (default 30). -Mandatory subprocess timeout — a crashed signaller raises `ToolError` -rather than blocking indefinitely. +Mandatory subprocess timeout — a crashed signaller raises an +{exc}`~libtmux_mcp._utils.ExpectedToolError` rather than blocking +indefinitely. ```{fastmcp-tool-input} wait_for_tools.wait_for_channel ``` diff --git a/docs/tools/pane/wait-for-content-change.md b/docs/tools/pane/wait-for-content-change.md index 3f710652..aa2fa584 100644 --- a/docs/tools/pane/wait-for-content-change.md +++ b/docs/tools/pane/wait-for-content-change.md @@ -12,7 +12,7 @@ specific pattern. precise and avoids false positives from unrelated output. **Side effects:** None. Readonly. Blocks until content changes or timeout. -Raises a tool error if the pane dies or is respawned while waiting, because the +Raises an {exc}`~libtmux_mcp._utils.ExpectedToolError` if the pane dies or is respawned while waiting, because the entry content baseline no longer describes the same pane process. **Example:** diff --git a/docs/topics/architecture.md b/docs/topics/architecture.md index 05d536d7..fa1d5c6c 100644 --- a/docs/topics/architecture.md +++ b/docs/topics/architecture.md @@ -13,7 +13,7 @@ src/libtmux_mcp/ server.py # FastMCP instance and configuration _utils.py # Server caching, resolvers, serializers, error handling models.py # Pydantic output models - middleware.py # Safety tier middleware + middleware.py # Safety, audit, retry, and error-result middleware tools/ server_tools.py # list_sessions, create_session, kill_server, get_server_info session_tools.py # list_windows, create_window, rename_session, kill_session @@ -27,14 +27,22 @@ src/libtmux_mcp/ ## Request flow +Middleware wraps tool calls outermost-first (full ordering rationale in +the `server.py` stack comment): + ``` MCP Client (Claude, Cursor, etc.) → stdio transport → FastMCP server (server.py) - → SafetyMiddleware (middleware.py) - → Tool function (tools/*.py) - → libtmux Server/Session/Window/Pane - → tmux binary (via subprocess) + → TimingMiddleware (wall-time observer) + → TailPreservingResponseLimitingMiddleware (response size backstop) + → ToolErrorResultMiddleware (exceptions → is_error results) + → AuditMiddleware (one log record per call) + → ReadonlyRetryMiddleware (retries readonly tools only) + → SafetyMiddleware (tier gate, fail-closed) + → Tool function (tools/*.py) + → libtmux Server/Session/Window/Pane + → tmux binary (via subprocess) ``` ## Key design decisions @@ -60,7 +68,13 @@ Tools use resolver functions (`_resolve_session`, `_resolve_window`, `_resolve_p ### Error handling -The `@handle_tool_errors` decorator wraps all tool functions, catching libtmux exceptions and converting them to `ToolError` with descriptive messages. +Three boundaries split the work: + +1. **Tool classification** — the {func}`~libtmux_mcp._utils.handle_tool_errors` decorator wraps tool functions, mapping libtmux exceptions to {exc}`~libtmux_mcp._utils.ExpectedToolError` (agent-correctable: unknown ids, invalid arguments, transient tmux errors; logged at WARNING) or stock `ToolError` (operator faults and unexpected bugs; logged at ERROR). The raise chains the original exception via `from e`, which is what lets {class}`~libtmux_mcp.middleware.ReadonlyRetryMiddleware` match transient `LibTmuxException` causes. +2. **Schema classification** — FastMCP validates tool arguments before tool code runs, so Pydantic validation failures never reach the decorator. {class}`~libtmux_mcp.middleware.ToolErrorResultMiddleware` classifies those schema-validation errors as expected, agent-correctable WARNINGs before converting them. +3. **Conversion** — {class}`~libtmux_mcp.middleware.ToolErrorResultMiddleware` catches the exception once it has cleared the audit/retry/safety trio and returns `ToolResult(is_error=True)` carrying the message exactly as raised, plus a `_meta` payload (`error_type`, `expected`, and an optional agent-facing `suggestion` for recovery hints such as discovery tools or rejected-argument fixes). + +Errors must stay exceptions through the audit/retry/safety trio — audit detects failures by catching, retry matches via `__cause__` — so conversion happens only in the outermost error layer. The response limiter sits outside conversion and may truncate large success or error results on the return path; its truncation path preserves `is_error` and `_meta` so oversized expected failures stay tool errors. Level policy lives in {doc}`/topics/logging`. ## References diff --git a/docs/topics/gotchas.md b/docs/topics/gotchas.md index 02095c51..99158f52 100644 --- a/docs/topics/gotchas.md +++ b/docs/topics/gotchas.md @@ -71,3 +71,17 @@ The `suppress_history` parameter on `send_keys` prepends a space before the comm `clear_pane` runs two tmux commands in sequence: `send-keys -R` (reset terminal) then `clear-history` (clear scrollback). There is a brief gap between them where partial content may be visible. For most use cases this is not a problem. If you need guaranteed clean state, add a small delay before the next `capture_pane`. + +## Gemini CLI injects `wait_for_previous` into tool arguments + +When Gemini CLI batches several tool calls in one turn, its scheduler merges the internal sequencing flag of the later calls into the MCP tool's arguments: + +```json +{"tool": "get_pane_info", "arguments": {"wait_for_previous": true, "pane_id": "%0"}} +``` + +This has been observed with stock Gemini CLI behavior (no extensions involved). In local non-interactive `gemini -p` harness runs, batching can front-load the topic tool and the first MCP calls into one parallel turn, which is where the leaked flag usually appears; interactive sessions tend to schedule more sequentially and trigger it less often. + +Tool schemas are strict (`additionalProperties: false`), so the call is rejected with a validation error — classified as expected (WARNING log, `expected: true` in the result's `_meta`) and carrying a suggestion that names the rejected argument and identifies `wait_for_previous` as a client scheduling flag to retry without. Gemini's model reads it, drops the flag, and retries successfully on its own. + +The visible symptom is harmless noise: `Error executing tool mcp_tmux_: ... reported an error` lines in Gemini's output for calls that then succeed on retry, and matching WARNING records in the server log. Similar reports in other MCP servers have handled this injected key by stripping it or whitelisting arguments against the schema ([MemPalace/mempalace#816](https://github.com/MemPalace/mempalace/issues/816)). This server deliberately keeps the rejection: silently dropping unknown arguments would also swallow genuine argument typos from every client — on a server with mutating and destructive tools, a mis-named flag (`enter` on `send_keys`, say) must fail loudly, not run with defaults. diff --git a/docs/topics/logging.md b/docs/topics/logging.md index 8ad256a0..3bcf742f 100644 --- a/docs/topics/logging.md +++ b/docs/topics/logging.md @@ -23,6 +23,23 @@ are: {exc}`libtmux.exc.LibTmuxException`. - ``libtmux_mcp.server`` / ``libtmux_mcp.tools.*`` / etc. — ad-hoc warnings and debug messages from the codebase. +- ``fastmcp.errors`` — one record per failed tool call, emitted by + {class}`~libtmux_mcp.middleware.ToolErrorResultMiddleware`. + +## Error levels + +Tool failures are logged at a level matching who needs to act: + +- **WARNING** — expected, agent-correctable failures: unknown + pane/window/session ids, invalid arguments, safety-tier denials, + transient tmux errors. The calling agent receives the message and + can correct course; operators don't need to. +- **ERROR** — operator faults and potential bugs: a missing ``tmux`` + binary, or any unexpected exception escaping a tool. + +A WARNING-heavy, ERROR-quiet log stream is therefore healthy — it +means agents are probing and recovering. ERROR records are the ones +worth alerting on. ## Level control diff --git a/docs/topics/safety.md b/docs/topics/safety.md index 565fb5c7..df003702 100644 --- a/docs/topics/safety.md +++ b/docs/topics/safety.md @@ -102,7 +102,7 @@ Unlike other `mutating` tools, the registration carries `destructiveHint=True` a Mitigations: -- `pane_id` is required (no fallback to "first pane in session/window"). Agents that pass only `session_name` get a `ToolError` instead of an unintended kill — resolve via {tool}`list-panes` first. +- `pane_id` is required (no fallback to "first pane in session/window"). Agents that pass only `session_name` get an {exc}`~libtmux_mcp._utils.ExpectedToolError` instead of an unintended kill — resolve via {tool}`list-panes` first. - Any `shell` argument is briefly visible in the OS process table and tmux's `pane_current_command` metadata before the spawned shell takes over; the audit log redacts `shell` payloads (see below), but do not pass credentials directly even with redaction. - The optional `environment` argument (`dict[str, str]`) maps to one tmux `-e KEY=VALUE` flag per item. The audit log redacts each *value* via a `{len, sha256_prefix}` digest while keeping the *keys* visible — env var names like `DATABASE_URL` are usually operator-debug-useful, but their values are the secret. The same OS-process-table caveat as `shell` applies: `respawn-pane -e DB_PASSWORD=...` may briefly appear in `ps` output before the spawned process inherits the env. - The same self-pane guard that protects the destructive kill commands also refuses to respawn the pane running the MCP server. diff --git a/pyproject.toml b/pyproject.toml index c5b35d0f..c54436a0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -41,7 +41,7 @@ include = [ dependencies = [ "libtmux>=0.58.0,<1.0", - "fastmcp>=3.2.4,<4.0.0", + "fastmcp>=3.4.0,<4.0.0", ] [project.urls] diff --git a/src/libtmux_mcp/_utils.py b/src/libtmux_mcp/_utils.py index b7fe0ff6..8c1bfea9 100644 --- a/src/libtmux_mcp/_utils.py +++ b/src/libtmux_mcp/_utils.py @@ -30,6 +30,69 @@ logger = logging.getLogger(__name__) +class ExpectedToolError(ToolError): + """``ToolError`` for expected, agent-correctable failures. + + Defaults the error's ``log_level`` to ``WARNING`` (honored by + fastmcp >= 3.3 when logging tool/resource failures) so routine + validation errors, missing objects, and tier denials do not surface + as ERROR records. Unexpected failures keep stock :class:`ToolError` + and its ERROR default — those are the ones operators must see. + + Parameters + ---------- + *args : object + Positional arguments forwarded to :class:`ToolError` + (typically the error message). + log_level : int + Level fastmcp's server layer logs this failure at. Defaults + to ``logging.WARNING``. + suggestion : str, optional + Agent-facing recovery hint. + :class:`~libtmux_mcp.middleware.ToolErrorResultMiddleware` + appends it to the error result's text and mirrors it into the + result's ``meta``. + + Examples + -------- + >>> import logging + >>> ExpectedToolError("Pane not found: %5").log_level == logging.WARNING + True + + An explicit level still wins: + + >>> err = ExpectedToolError("noisy", log_level=logging.INFO) + >>> err.log_level == logging.INFO + True + + Catch sites that handle ``ToolError`` keep working — this is a + plain subclass: + + >>> isinstance(ExpectedToolError("x"), ToolError) + True + + An optional ``suggestion`` carries an agent-facing recovery hint; + :class:`libtmux_mcp.middleware.ToolErrorResultMiddleware` surfaces + it in the error result's text and ``meta``: + + >>> err = ExpectedToolError("Pane not found: %5", + ... suggestion="Call list_panes to discover valid pane ids.") + >>> err.suggestion + 'Call list_panes to discover valid pane ids.' + >>> ExpectedToolError("no hint").suggestion is None + True + """ + + def __init__( + self, + *args: object, + log_level: int = logging.WARNING, + suggestion: str | None = None, + ) -> None: + super().__init__(*args, log_level=log_level) + self.suggestion = suggestion + + @dataclasses.dataclass(frozen=True) class CallerIdentity: """Identity of the tmux pane hosting this MCP server process. @@ -714,7 +777,7 @@ def _coerce_dict_arg( Raises ------ - ToolError + ExpectedToolError If ``value`` is a string that is not valid JSON, or decodes to a JSON value that is not an object. """ @@ -725,10 +788,10 @@ def _coerce_dict_arg( decoded = json.loads(value) except (json.JSONDecodeError, ValueError) as e: msg = f"Invalid {name} JSON: {e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e if not isinstance(decoded, dict): msg = f"{name} must be a JSON object, got {type(decoded).__name__}" - raise ToolError(msg) from None + raise ExpectedToolError(msg) from None return decoded return value @@ -758,7 +821,7 @@ def _apply_filters( Raises ------ - ToolError + ExpectedToolError If a filter key uses an invalid lookup operator. """ coerced = _coerce_dict_arg("filters", filters) @@ -775,7 +838,7 @@ def _apply_filters( f"Invalid filter operator '{op}' in '{key}'. " f"Valid operators: {', '.join(valid_ops)}" ) - raise ToolError(msg) + raise ExpectedToolError(msg) filtered = items.filter(**filters) return [serializer(item) for item in filtered] @@ -922,20 +985,34 @@ def _map_exception_to_tool_error(fn_name: str, e: BaseException) -> ToolError: Shared between the sync and async ``handle_tool_errors*`` decorators so the two paths stay byte-for-byte identical in what agents see. + + Expected, agent-correctable failures map to + :class:`ExpectedToolError` (logged at WARNING). Two cases stay at + ERROR: a missing tmux binary (operator-environment fault that must + be loud) and the unexpected catch-all (potential bug in this + server). """ if isinstance(e, exc.TmuxCommandNotFound): msg = "tmux binary not found. Ensure tmux is installed and in PATH." return ToolError(msg) if isinstance(e, exc.TmuxSessionExists): - return ToolError(str(e)) + return ExpectedToolError(str(e)) if isinstance(e, exc.BadSessionName): - return ToolError(str(e)) + return ExpectedToolError(str(e)) if isinstance(e, exc.TmuxObjectDoesNotExist): - return ToolError(f"Object not found: {e}") + return ExpectedToolError( + f"Object not found: {e}", + suggestion=( + "Call list_sessions / list_windows / list_panes to discover valid ids." + ), + ) if isinstance(e, exc.PaneNotFound): - return ToolError(f"Pane not found: {e}") + return ExpectedToolError( + f"Pane not found: {e}", + suggestion="Call list_panes to discover valid pane ids.", + ) if isinstance(e, exc.LibTmuxException): - return ToolError(f"tmux error: {e}") + return ExpectedToolError(f"tmux error: {e}") logger.exception("unexpected error in MCP tool %s", fn_name) return ToolError(f"Unexpected error: {type(e).__name__}: {e}") @@ -945,8 +1022,19 @@ def handle_tool_errors( ) -> t.Callable[P, R]: """Decorate synchronous MCP tool functions with standardized error handling. - Catches libtmux exceptions and re-raises as ``ToolError`` so that - MCP responses have ``isError=True`` with a descriptive message. + Catches libtmux exceptions and re-raises them through + :func:`_map_exception_to_tool_error` so MCP responses have + ``isError=True`` with a descriptive message — expected, + agent-correctable failures as :class:`ExpectedToolError` (logged + at WARNING), the unexpected catch-all as stock ``ToolError`` + (logged at ERROR). + + The re-raise chains the original exception via ``from e``. Keep it + single-level: :class:`~libtmux_mcp.middleware.ReadonlyRetryMiddleware` + matches :exc:`libtmux.exc.LibTmuxException` by inspecting exactly + one ``__cause__`` hop, so wrapping the mapped error again would + silently disable readonly retries. + Use :func:`handle_tool_errors_async` for ``async def`` tools — this wrapper only supports plain sync callables. """ @@ -973,8 +1061,12 @@ def handle_tool_errors_async( ``report_progress``/``elicit``/``read_resource`` methods are coroutines that only run inside ``async def`` tools. - Maps the same libtmux exception set to the same ``ToolError`` - messages as the sync decorator by delegating to a shared helper. + Maps the same libtmux exception set to the same messages and + error classes as the sync decorator (expected failures as + :class:`ExpectedToolError` at WARNING, the unexpected catch-all as + stock ``ToolError`` at ERROR) by delegating to a shared helper, + and chains the original exception via the same single-level + ``from e`` that readonly retries depend on. """ @functools.wraps(fn) diff --git a/src/libtmux_mcp/middleware.py b/src/libtmux_mcp/middleware.py index b3e0dc24..dfc751e7 100644 --- a/src/libtmux_mcp/middleware.py +++ b/src/libtmux_mcp/middleware.py @@ -1,14 +1,22 @@ """Middleware for libtmux MCP server. -Provides three pieces of infrastructure: +Provides the project's middleware infrastructure, in definition order: * :class:`SafetyMiddleware` gates tools by safety tier based on the ``LIBTMUX_SAFETY`` environment variable. Tools tagged above the configured tier are hidden from listing and blocked from execution. +* :class:`ToolErrorResultMiddleware` converts tool-call failures into + ``ToolResult(is_error=True)`` results that carry the clean error + message plus a structured ``meta`` payload, instead of fastmcp's + stock ``-32603`` catch-all that prefixed every expected failure + with ``"Internal error: "``. * :class:`AuditMiddleware` emits a structured log record for each tool invocation (name, duration, outcome, client/request ids, and a summary of arguments with payload-bearing fields redacted to a length + SHA-256 prefix). +* :class:`ReadonlyRetryMiddleware` retries transient libtmux failures, + but only for readonly tools — re-running a mutating tool would + silently double side effects. * :class:`TailPreservingResponseLimitingMiddleware` is a backstop cap for oversized tool output. Unlike FastMCP's stock ``ResponseLimitingMiddleware`` it preserves the **tail** of the @@ -24,15 +32,23 @@ import time import typing as t -from fastmcp.exceptions import ToolError from fastmcp.server.middleware import Middleware, MiddlewareContext -from fastmcp.server.middleware.error_handling import RetryMiddleware +from fastmcp.server.middleware.error_handling import ( + ErrorHandlingMiddleware, + RetryMiddleware, +) from fastmcp.server.middleware.response_limiting import ResponseLimitingMiddleware from fastmcp.tools.base import ToolResult from libtmux import exc as libtmux_exc -from mcp.types import TextContent - -from libtmux_mcp._utils import TAG_DESTRUCTIVE, TAG_MUTATING, TAG_READONLY +from mcp.types import CallToolRequestParams, TextContent +from pydantic import ValidationError as PydanticValidationError + +from libtmux_mcp._utils import ( + TAG_DESTRUCTIVE, + TAG_MUTATING, + TAG_READONLY, + ExpectedToolError, +) _TIER_LEVELS: dict[str, int] = { TAG_READONLY: 0, @@ -90,10 +106,243 @@ async def on_call_tool( f"current safety level. Set LIBTMUX_SAFETY=destructive " f"to enable destructive tools." ) - raise ToolError(msg) + raise ExpectedToolError(msg) return await call_next(context) +# --------------------------------------------------------------------------- +# Tool-error result conversion +# --------------------------------------------------------------------------- + + +def _is_schema_validation_error(error: BaseException) -> bool: + """Return True for fastmcp argument-schema validation failures. + + fastmcp validates tool arguments against the input schema *before* + tool code runs, raising a bare :exc:`pydantic.ValidationError` — + too early for the ``handle_tool_errors`` decorators to classify. + Bad arguments are agent-correctable (fix the call and retry), so + they get the same expected/WARNING treatment as + :class:`~libtmux_mcp._utils.ExpectedToolError`. + + Output validation cannot be mistaken for this case: fastmcp's tool + layer converts output-shape failures into error results itself, so + they never reach the middleware as exceptions. + """ + return isinstance(error, PydanticValidationError) or isinstance( + error.__cause__, PydanticValidationError + ) + + +#: Scheduling flag some MCP clients (notably Gemini CLI when batching +#: several tool calls in one turn) merge into the tool's arguments. +#: Recognized only to *word the rejection helpfully* — the argument is +#: still rejected, never silently stripped, so genuine argument typos +#: from other clients stay loud. Contrast MemPalace/mempalace#322, +#: which strips the key, and #647, which whitelists arguments against +#: the schema — silent dropping would let a mis-named flag on a +#: mutating tool (e.g. ``enter`` on send_keys) run with defaults. +_CLIENT_SCHEDULING_FLAG = "wait_for_previous" + + +def _unexpected_kwargs(error: BaseException) -> list[str]: + """Argument names rejected as unexpected by schema validation. + + Reads pydantic's structured ``errors()`` from ``error`` or its + ``__cause__`` (same posture as :func:`_is_schema_validation_error`) + and returns the names flagged ``unexpected_keyword_argument``. + Empty list for every other failure shape. + """ + err = error if isinstance(error, PydanticValidationError) else error.__cause__ + if not isinstance(err, PydanticValidationError): + return [] + return [ + str(item["loc"][-1]) + for item in err.errors() + if item.get("type") == "unexpected_keyword_argument" and item.get("loc") + ] + + +def _client_label(context: MiddlewareContext | None) -> str | None: + """``"name version"`` of the connected client, when the handshake exposed it. + + Walks ``fastmcp_context.session.client_params.clientInfo`` — the + MCP ``initialize`` handshake's client identity. Every hop can be + absent (unit-test contexts, background tasks, clients that omit + ``clientInfo``), so any failure resolves to ``None``. Used only to + word error suggestions; never gates behavior. + """ + if context is None: + return None + try: + fastmcp_ctx = context.fastmcp_context + if fastmcp_ctx is None: + return None + params = fastmcp_ctx.session.client_params + if params is None: + return None + info = params.clientInfo + return f"{info.name} {info.version}".strip() + except (AttributeError, RuntimeError): + return None + + +def _error_tool_result( + error: Exception, + context: MiddlewareContext | None = None, +) -> ToolResult: + """Build a rich ``ToolResult(is_error=True)`` from a tool failure. + + The text block carries the error message exactly as raised — no + transform-layer prefix — with the recovery ``suggestion`` appended + when the error provides one. ``meta`` mirrors the details in + machine-readable form: + + * ``error_type`` — class name of the originating exception + (``__cause__`` when the raise site chained one, so agents see + ``PaneNotFound`` rather than the ``ToolError`` wrapper). + * ``expected`` — True for agent-correctable failures + (:class:`~libtmux_mcp._utils.ExpectedToolError` and + argument-schema validation errors), False for operator faults + and potential server bugs. + * ``suggestion`` — recovery hint. Carried by the error when the + raise site provided one; synthesized for schema-validation + failures that rejected unexpected arguments, so the agent knows + to drop or fix exactly those names (with a client-flag note for + :data:`_CLIENT_SCHEDULING_FLAG` leaks, naming the client via + ``context`` when the handshake exposed it). + + ``structured_content`` is deliberately left unset: tools declare + output schemas for their success payloads, and clients validate + ``structuredContent`` against them — an error-shaped payload there + would fail validation on strict clients. + """ + cause = error.__cause__ + origin = cause if cause is not None else error + meta: dict[str, t.Any] = { + "error_type": type(origin).__name__, + "expected": isinstance(error, ExpectedToolError) + or _is_schema_validation_error(error), + } + text = str(error) + suggestion = getattr(error, "suggestion", None) + if suggestion is None: + unknown = _unexpected_kwargs(error) + if unknown: + suggestion = ( + f"Remove or correct the unrecognized argument(s): {', '.join(unknown)}." + ) + if _CLIENT_SCHEDULING_FLAG in unknown: + client = _client_label(context) + who = ( + f"your client ({client})" + if client + else "some clients (e.g. Gemini CLI)" + ) + suggestion += ( + f" {_CLIENT_SCHEDULING_FLAG} is a scheduling flag " + f"{who} can leak into batched tool calls; retry " + f"the call without it." + ) + if suggestion: + meta["suggestion"] = suggestion + text = f"{text}\n{suggestion}" + return ToolResult( + content=[TextContent(type="text", text=text)], + meta=meta, + is_error=True, + ) + + +class ToolErrorResultMiddleware(ErrorHandlingMiddleware): + """Convert tool-call failures into rich ``ToolResult`` errors. + + Replaces the stock :class:`ErrorHandlingMiddleware` behavior for + ``tools/call`` only. The stock ``transform_errors=True`` path + funnels every non-MCP exception through a ``-32603`` catch-all, so + agents received ``"Internal error: Pane not found: %5"`` — the + transform mangled every expected failure message. This subclass + intercepts tool-call exceptions first (``on_call_tool`` is the + innermost hook of a middleware's chain) and returns + :func:`_error_tool_result` instead; non-tool messages fall through + to the inherited ``on_message``, preserving the MCP ``-32002`` + resource-not-found transform this middleware was originally + adopted for. + + Logging honors ``FastMCPError.log_level`` (fastmcp >= 3.3): the + expected failures demoted to WARNING by + :class:`~libtmux_mcp._utils.ExpectedToolError` no longer get + re-shouted at ERROR by the stock ``_log_error``. Argument-schema + validation failures — raised by fastmcp before tool code can + classify them — are treated as expected too (see + :func:`_is_schema_validation_error`), and when the rejected + arguments include unexpected names the error result carries a + synthesized suggestion telling the agent which names to drop or + fix (see :func:`_error_tool_result`). + + Ordering invariant: must sit **outside** ``AuditMiddleware``, + ``ReadonlyRetryMiddleware``, and ``SafetyMiddleware``. All three + depend on exception semantics — audit detects failures by catching, + retry matches ``LibTmuxException`` via ``__cause__``, and safety's + tier denials must propagate as exceptions for audit to record them + — so converting the exception to a result any deeper in the stack + would silently break all three. + """ + + def _log_error(self, error: Exception, context: MiddlewareContext) -> None: + """Log at the error's own ``log_level`` instead of a flat ERROR. + + Mirrors the stock implementation (error-count tracking, + optional traceback, error callback) but routes the record + through ``logger.log`` with ``FastMCPError.log_level``. + Exceptions that don't carry one default to ERROR — except + argument-schema validation failures, which are + agent-correctable and log at WARNING like every other invalid + argument (see :func:`_is_schema_validation_error`). + """ + level: int | None = getattr(error, "log_level", None) + if level is None: + level = ( + logging.WARNING if _is_schema_validation_error(error) else logging.ERROR + ) + + error_type = type(error).__name__ + method = context.method or "unknown" + + error_key = f"{error_type}:{method}" + self.error_counts[error_key] = self.error_counts.get(error_key, 0) + 1 + + # Lazy %-formatting (project logging standard) — also collapses + # the stock implementation's include_traceback branch, since + # ``exc_info`` accepts a bool. + self.logger.log( + level, + "Error in %s: %s: %s", + method, + error_type, + error, + exc_info=self.include_traceback, + ) + + if self.error_callback: + try: + self.error_callback(error, context) + except Exception: + self.logger.exception("Error in error callback") + + async def on_call_tool( + self, + context: MiddlewareContext, + call_next: t.Any, + ) -> t.Any: + """Convert tool-call exceptions into ``is_error`` results.""" + try: + return await call_next(context) + except Exception as error: + self._log_error(error, context) + return _error_tool_result(error, context) + + # --------------------------------------------------------------------------- # Audit middleware # --------------------------------------------------------------------------- @@ -359,8 +608,46 @@ class TailPreservingResponseLimitingMiddleware(ResponseLimitingMiddleware): caps at the tool layer fire first under normal operation; this middleware catches pathological output from future tools that forget to declare their own bounds. + + Error results keep their ``is_error`` flag through truncation. + The stock truncation path rebuilds the result without it, which + turns an oversized error (e.g. a validation message echoing a + huge argument) into an apparent success — MCP clients then + validate the truncated text against the tool's output schema and + fail with a transport-level error instead of delivering the tool + error. ``meta`` (``error_type`` / ``expected`` / ``suggestion``) + already survives via the base class, and tail-preservation keeps + the suggestion line, which sits at the end of the text. """ + async def on_call_tool( + self, + context: MiddlewareContext, + call_next: t.Any, + ) -> t.Any: + """Apply the size cap without dropping ``is_error``.""" + inner: t.Any = None + + async def _capture( + context: MiddlewareContext[CallToolRequestParams], + ) -> ToolResult: + # ``context`` (not ``ctx``): fastmcp's CallNext protocol + # matches the parameter name, not just the shape. + nonlocal inner + inner = await call_next(context) + return t.cast("ToolResult", inner) + + result = await super().on_call_tool(context, _capture) + if result is not inner and isinstance(inner, ToolResult) and inner.is_error: + # The base class truncated and rebuilt the result; restore + # the error flag it dropped. + return ToolResult( + content=result.content, + meta=result.meta, + is_error=True, + ) + return result + def _truncate_to_result( self, text: str, diff --git a/src/libtmux_mcp/server.py b/src/libtmux_mcp/server.py index 1ff4afc9..9b8bc427 100644 --- a/src/libtmux_mcp/server.py +++ b/src/libtmux_mcp/server.py @@ -12,7 +12,6 @@ import typing as t from fastmcp import FastMCP -from fastmcp.server.middleware.error_handling import ErrorHandlingMiddleware from fastmcp.server.middleware.timing import TimingMiddleware if t.TYPE_CHECKING: @@ -32,6 +31,7 @@ ReadonlyRetryMiddleware, SafetyMiddleware, TailPreservingResponseLimitingMiddleware, + ToolErrorResultMiddleware, ) from libtmux_mcp.tools.buffer_tools import _MCP_BUFFER_PREFIX @@ -275,14 +275,21 @@ def _gc_mcp_buffers(cache: t.Mapping[_ServerCacheKey, Server]) -> None: # Middleware runs outermost-first. Order rationale: # 1. TimingMiddleware — neutral observer; start clock as early # as possible so timing captures middleware cost too. - # 2. TailPreservingResponseLimitingMiddleware — bound the - # response *before* ErrorHandlingMiddleware can transform - # exceptions; keeps the size cap independent of error path. - # 3. ErrorHandlingMiddleware — transforms resource errors to - # MCP code -32002; sits inside so it wraps the audit + safety - # pair. + # 2. TailPreservingResponseLimitingMiddleware — bounds the final + # tool result on the way back out. Tool errors may already be + # ToolResult(is_error=True) here, so truncation preserves that + # flag instead of turning expected failures into schema errors. + # 3. ToolErrorResultMiddleware — converts tool-call failures to + # rich ToolResult(is_error=True) results and transforms + # resource errors to MCP code -32002. Must stay OUTSIDE the + # audit + retry + safety trio: all three depend on exception + # semantics (audit catches to record outcome=error, retry + # matches LibTmuxException via __cause__, and safety's tier + # denials must propagate as exceptions for audit to record + # them), so converting the exception to a result any deeper + # would silently break all three. # 4. AuditMiddleware — outside SafetyMiddleware so tier-denial - # events (which raise ToolError before call_next inside + # events (which raise ExpectedToolError before call_next inside # Safety) are still logged with outcome=error. Without this # ordering, denied access attempts would silently bypass the # audit log — a security-observability gap. @@ -299,7 +306,7 @@ def _gc_mcp_buffers(cache: t.Mapping[_ServerCacheKey, Server]) -> None: max_size=DEFAULT_RESPONSE_LIMIT_BYTES, tools=_RESPONSE_LIMITED_TOOLS, ), - ErrorHandlingMiddleware(transform_errors=True), + ToolErrorResultMiddleware(transform_errors=True), AuditMiddleware(), ReadonlyRetryMiddleware(), SafetyMiddleware(max_tier=_safety_level), diff --git a/src/libtmux_mcp/tools/buffer_tools.py b/src/libtmux_mcp/tools/buffer_tools.py index e702c7fb..c965e378 100644 --- a/src/libtmux_mcp/tools/buffer_tools.py +++ b/src/libtmux_mcp/tools/buffer_tools.py @@ -31,14 +31,13 @@ import typing as t import uuid -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( ANNOTATIONS_MUTATING, ANNOTATIONS_RO, ANNOTATIONS_SHELL, TAG_MUTATING, TAG_READONLY, + ExpectedToolError, _get_server, _resolve_pane, _tmux_argv, @@ -93,17 +92,17 @@ def _validate_logical_name(name: str) -> str: >>> _validate_logical_name("has space") Traceback (most recent call last): ... - fastmcp.exceptions.ToolError: Invalid logical buffer name: 'has space' + libtmux_mcp._utils.ExpectedToolError: Invalid logical buffer name: 'has space' >>> _validate_logical_name("with/slash") Traceback (most recent call last): ... - fastmcp.exceptions.ToolError: Invalid logical buffer name: 'with/slash' + libtmux_mcp._utils.ExpectedToolError: Invalid logical buffer name: 'with/slash' """ if name == "": return "buf" if not _LOGICAL_NAME_RE.fullmatch(name): msg = f"Invalid logical buffer name: {name!r}" - raise ToolError(msg) + raise ExpectedToolError(msg) return name @@ -122,15 +121,15 @@ def _validate_buffer_name(name: str) -> str: >>> _validate_buffer_name("clipboard") Traceback (most recent call last): ... - fastmcp.exceptions.ToolError: Invalid buffer name: 'clipboard' + libtmux_mcp._utils.ExpectedToolError: Invalid buffer name: 'clipboard' >>> _validate_buffer_name("libtmux_mcp_shortuuid_buf") Traceback (most recent call last): ... - fastmcp.exceptions.ToolError: Invalid buffer name: 'libtmux_mcp_shortuuid_buf' + libtmux_mcp._utils.ExpectedToolError: Invalid buffer name: 'libtmux_mcp_...' """ if not _BUFFER_NAME_RE.fullmatch(name): msg = f"Invalid buffer name: {name!r}" - raise ToolError(msg) + raise ExpectedToolError(msg) return name @@ -216,11 +215,11 @@ def load_buffer( subprocess.run(argv, check=True, capture_output=True, timeout=5.0) except subprocess.TimeoutExpired as e: msg = f"load-buffer timeout after 5s for {buffer_name!r}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e except subprocess.CalledProcessError as e: stderr = e.stderr.decode(errors="replace").strip() if e.stderr else "" msg = f"load-buffer failed: {stderr or e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e finally: if tmppath is not None: pathlib.Path(tmppath).unlink(missing_ok=True) @@ -317,11 +316,11 @@ def show_buffer( ) except subprocess.TimeoutExpired as e: msg = f"show-buffer timeout after 5s for {cname!r}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e except subprocess.CalledProcessError as e: stderr = e.stderr.decode(errors="replace").strip() if e.stderr else "" msg = f"show-buffer failed for {cname!r}: {stderr or e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e raw = completed.stdout.decode(errors="replace") # Preserve a possible trailing newline so round-tripping through # load_buffer/show_buffer stays byte-identical when truncation @@ -363,11 +362,11 @@ def delete_buffer( subprocess.run(argv, check=True, capture_output=True, timeout=5.0) except subprocess.TimeoutExpired as e: msg = f"delete-buffer timeout after 5s for {cname!r}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e except subprocess.CalledProcessError as e: stderr = e.stderr.decode(errors="replace").strip() if e.stderr else "" msg = f"delete-buffer failed for {cname!r}: {stderr or e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e return f"Buffer {cname!r} deleted" diff --git a/src/libtmux_mcp/tools/hook_tools.py b/src/libtmux_mcp/tools/hook_tools.py index 3b982cf5..a42f4d67 100644 --- a/src/libtmux_mcp/tools/hook_tools.py +++ b/src/libtmux_mcp/tools/hook_tools.py @@ -30,13 +30,13 @@ import typing as t -from fastmcp.exceptions import ToolError from libtmux import exc as libtmux_exc from libtmux.constants import OptionScope from libtmux_mcp._utils import ( ANNOTATIONS_RO, TAG_READONLY, + ExpectedToolError, _get_server, _resolve_pane, _resolve_session, @@ -86,11 +86,11 @@ def _resolve_hook_target( if scope is not None and opt_scope is None: valid = ", ".join(sorted(_SCOPE_MAP)) msg = f"Invalid scope: {scope!r}. Valid: {valid}" - raise ToolError(msg) + raise ExpectedToolError(msg) if target is not None and opt_scope is None: msg = "scope is required when target is specified" - raise ToolError(msg) + raise ExpectedToolError(msg) if target is not None and opt_scope is not None: # Let the resolved object carry its own scope — passing scope diff --git a/src/libtmux_mcp/tools/option_tools.py b/src/libtmux_mcp/tools/option_tools.py index b4970ab4..e992fd8f 100644 --- a/src/libtmux_mcp/tools/option_tools.py +++ b/src/libtmux_mcp/tools/option_tools.py @@ -4,7 +4,6 @@ import typing as t -from fastmcp.exceptions import ToolError from libtmux.constants import OptionScope from libtmux_mcp._utils import ( @@ -12,6 +11,7 @@ ANNOTATIONS_RO, TAG_MUTATING, TAG_READONLY, + ExpectedToolError, _get_server, _resolve_pane, _resolve_session, @@ -44,11 +44,11 @@ def _resolve_option_target( if scope is not None and opt_scope is None: valid = ", ".join(sorted(_SCOPE_MAP)) msg = f"Invalid scope: {scope!r}. Valid: {valid}" - raise ToolError(msg) + raise ExpectedToolError(msg) if target is not None and opt_scope is None: msg = "scope is required when target is specified" - raise ToolError(msg) + raise ExpectedToolError(msg) if target is not None and opt_scope is not None: if opt_scope == OptionScope.Session: diff --git a/src/libtmux_mcp/tools/pane_tools/capture_since.py b/src/libtmux_mcp/tools/pane_tools/capture_since.py index 44ab2f63..0d08af55 100644 --- a/src/libtmux_mcp/tools/pane_tools/capture_since.py +++ b/src/libtmux_mcp/tools/pane_tools/capture_since.py @@ -11,9 +11,8 @@ import typing as t from dataclasses import dataclass -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( + ExpectedToolError, _get_server, _resolve_pane, handle_tool_errors_async, @@ -106,7 +105,7 @@ def _raise_if_dead_without_baseline(pane: Pane, state: _PaneState) -> None: """Raise a tool error for a dead pane before a cursor exists.""" if state.pane_dead: msg = f"pane {pane.pane_id} died during pane read" - raise ToolError(msg) + raise ExpectedToolError(msg) def _read_stable_visible( @@ -306,7 +305,7 @@ def _build_cursor(pane_id: str, state: _PaneState, cursor_rows: list[str]) -> st def _raise_invalid_cursor(reason: str) -> t.NoReturn: """Raise a consistently worded invalid-cursor error.""" msg = f"invalid capture_since cursor: {reason}" - raise ToolError(msg) + raise ExpectedToolError(msg) def _cursor_str(payload: t.Mapping[str, t.Any], key: str) -> str: @@ -340,7 +339,7 @@ def _decode_cursor(cursor: str) -> _CaptureCursor: except (binascii.Error, json.JSONDecodeError, UnicodeDecodeError) as err: reason = "could not decode payload" msg = f"invalid capture_since cursor: {reason}" - raise ToolError(msg) from err + raise ExpectedToolError(msg) from err if not isinstance(payload, dict): reason = "payload is not an object" @@ -375,10 +374,10 @@ def _validate_limits(max_lines: int | None, max_bytes: int | None) -> None: """Validate caller-supplied truncation limits.""" if max_lines is not None and max_lines <= 0: msg = f"max_lines must be positive or None (received {max_lines})" - raise ToolError(msg) + raise ExpectedToolError(msg) if max_bytes is not None and max_bytes <= 0: msg = f"max_bytes must be positive or None (received {max_bytes})" - raise ToolError(msg) + raise ExpectedToolError(msg) def _encoded_size(lines: list[str]) -> int: @@ -450,7 +449,7 @@ async def capture_since( If tmux history was cleared or trimmed before the cursor anchor, the tool returns the current visible pane with ``lines_missed=True`` and a fresh cursor. Malformed cursors, cursors for a different - pane, pane death, and pane respawn fail with ``ToolError`` so + pane, pane death, and pane respawn fail with ``ExpectedToolError`` so agents do not accidentally observe the wrong process. Parameters @@ -507,7 +506,7 @@ async def capture_since( f"cursor pane {decoded.pane_id} does not match requested pane " f"{pane.pane_id}" ) - raise ToolError(msg) + raise ExpectedToolError(msg) start_time = time.monotonic() if decoded is None: diff --git a/src/libtmux_mcp/tools/pane_tools/io.py b/src/libtmux_mcp/tools/pane_tools/io.py index de209b2d..65eccd60 100644 --- a/src/libtmux_mcp/tools/pane_tools/io.py +++ b/src/libtmux_mcp/tools/pane_tools/io.py @@ -8,9 +8,8 @@ import tempfile import uuid -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( + ExpectedToolError, _get_server, _resolve_pane, _tmux_argv, @@ -336,11 +335,11 @@ def paste_text( subprocess.run(load_args, check=True, capture_output=True, timeout=5.0) except subprocess.TimeoutExpired as e: msg = f"load-buffer timeout after 5s for {buffer_name!r}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e except subprocess.CalledProcessError as e: stderr = e.stderr.decode(errors="replace").strip() if e.stderr else "" msg = f"load-buffer failed: {stderr or e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e # Paste from the named buffer. ``delete_after=True`` (``-d``) # deletes only that named buffer, leaving any unnamed user diff --git a/src/libtmux_mcp/tools/pane_tools/layout.py b/src/libtmux_mcp/tools/pane_tools/layout.py index be4c74b1..2f814ab3 100644 --- a/src/libtmux_mcp/tools/pane_tools/layout.py +++ b/src/libtmux_mcp/tools/pane_tools/layout.py @@ -4,9 +4,8 @@ import typing as t -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( + ExpectedToolError, _get_server, _resolve_pane, _resolve_window, @@ -59,7 +58,7 @@ def resize_pane( """ if zoom is not None and (height is not None or width is not None): msg = "Cannot combine zoom with height/width" - raise ToolError(msg) + raise ExpectedToolError(msg) server = _get_server(socket_name=socket_name) pane = _resolve_pane( @@ -126,7 +125,7 @@ def select_pane( """ if pane_id is None and direction is None: msg = "Provide either pane_id or direction." - raise ToolError(msg) + raise ExpectedToolError(msg) server = _get_server(socket_name=socket_name) diff --git a/src/libtmux_mcp/tools/pane_tools/lifecycle.py b/src/libtmux_mcp/tools/pane_tools/lifecycle.py index 4eb2e0a9..a522b260 100644 --- a/src/libtmux_mcp/tools/pane_tools/lifecycle.py +++ b/src/libtmux_mcp/tools/pane_tools/lifecycle.py @@ -4,9 +4,8 @@ import typing as t -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( + ExpectedToolError, _caller_is_on_server, _get_caller_identity, _get_server, @@ -56,7 +55,7 @@ def kill_pane( "Refusing to kill the pane running this MCP server. " "Use a manual tmux command if intended." ) - raise ToolError(msg) + raise ExpectedToolError(msg) pane = _resolve_pane(server, pane_id=pane_id) pid = pane.pane_id @@ -149,7 +148,7 @@ def respawn_pane( "Refusing to respawn the pane running this MCP server. " "Use a manual tmux command if intended." ) - raise ToolError(msg) + raise ExpectedToolError(msg) pane.respawn( kill=kill, start_directory=start_directory, @@ -289,7 +288,7 @@ def find_pane_by_position( Raises ------ - ToolError + ExpectedToolError If no pane satisfies both edge predicates for that corner — in practice only possible for layouts tmux itself produced via custom layout strings; the built-in layouts always have a pane @@ -316,7 +315,7 @@ def find_pane_by_position( f"{window.window_id}. This is unusual — built-in layouts " "always have a pane at every corner." ) - raise ToolError(msg) + raise ExpectedToolError(msg) # When more than one pane qualifies (e.g. a single-pane window # touches all four edges, or an unusual layout), prefer the pane diff --git a/src/libtmux_mcp/tools/pane_tools/pipe.py b/src/libtmux_mcp/tools/pane_tools/pipe.py index d443ac73..a09d28fd 100644 --- a/src/libtmux_mcp/tools/pane_tools/pipe.py +++ b/src/libtmux_mcp/tools/pane_tools/pipe.py @@ -4,9 +4,8 @@ import shlex -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( + ExpectedToolError, _get_server, _resolve_pane, handle_tool_errors, @@ -80,7 +79,7 @@ def pipe_pane( if not output_path.strip(): msg = "output_path must be a non-empty path, or None to stop piping." - raise ToolError(msg) + raise ExpectedToolError(msg) redirect = ">>" if append else ">" pane.pipe(f"cat {redirect} {shlex.quote(output_path)}") diff --git a/src/libtmux_mcp/tools/pane_tools/search.py b/src/libtmux_mcp/tools/pane_tools/search.py index cfc9512d..cab6f12d 100644 --- a/src/libtmux_mcp/tools/pane_tools/search.py +++ b/src/libtmux_mcp/tools/pane_tools/search.py @@ -4,9 +4,8 @@ import re -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( + ExpectedToolError, _coerce_bool, _coerce_int, _compute_is_caller, @@ -150,7 +149,7 @@ def search_panes( compiled = re.compile(search_pattern, flags) except re.error as e: msg = f"Invalid regex pattern: {e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e server = _get_server(socket_name=socket_name) diff --git a/src/libtmux_mcp/tools/pane_tools/state.py b/src/libtmux_mcp/tools/pane_tools/state.py index ea3ced55..5ce50866 100644 --- a/src/libtmux_mcp/tools/pane_tools/state.py +++ b/src/libtmux_mcp/tools/pane_tools/state.py @@ -4,7 +4,7 @@ import typing as t -from fastmcp.exceptions import ToolError +from libtmux_mcp._utils import ExpectedToolError if t.TYPE_CHECKING: from libtmux.pane import Pane @@ -60,17 +60,17 @@ def _read_pane_state(pane: Pane) -> _PaneState: def _raise_if_pane_lifecycle_changed( pane: Pane, state: _PaneState, baseline_pid: str ) -> None: - """Raise ``ToolError`` when a cursor or wait baseline is invalid.""" + """Raise ``ExpectedToolError`` when a cursor or wait baseline is invalid.""" if state.pane_dead: msg = f"pane {pane.pane_id} died; cursor/baseline anchor is no longer valid" - raise ToolError(msg) + raise ExpectedToolError(msg) if state.pane_pid != baseline_pid: msg = ( f"pane {pane.pane_id} was respawned " f"(pid {baseline_pid} -> {state.pane_pid}); " "cursor/baseline anchor is no longer valid" ) - raise ToolError(msg) + raise ExpectedToolError(msg) def _read_history_limit(pane: Pane) -> int: diff --git a/src/libtmux_mcp/tools/pane_tools/wait.py b/src/libtmux_mcp/tools/pane_tools/wait.py index 1ffa7b15..0397a699 100644 --- a/src/libtmux_mcp/tools/pane_tools/wait.py +++ b/src/libtmux_mcp/tools/pane_tools/wait.py @@ -10,9 +10,9 @@ import anyio from fastmcp import Context -from fastmcp.exceptions import ToolError from libtmux_mcp._utils import ( + ExpectedToolError, _get_server, _resolve_pane, handle_tool_errors_async, @@ -211,7 +211,7 @@ async def wait_for_text( Notes ----- **Scrollback rollover detection is partial.** The tool raises - ``ToolError`` when ``hsize`` shrinks below the entry value — which + ``ExpectedToolError`` when ``hsize`` shrinks below the entry value — which catches ``clear-history`` and any rollover where the dip is observable between polls. It does **not** reliably detect ``grid_collect_history`` trim that fires during continuous output: @@ -269,13 +269,13 @@ async def wait_for_text( """ if not pattern: msg = "pattern must be a non-empty string" - raise ToolError(msg) + raise ExpectedToolError(msg) if interval < 0.01: msg = f"interval must be at least 0.01 s (received {interval})" - raise ToolError(msg) + raise ExpectedToolError(msg) if timeout <= 0: msg = f"timeout must be positive (received {timeout})" - raise ToolError(msg) + raise ExpectedToolError(msg) search_pattern = pattern if regex else re.escape(pattern) flags = 0 if match_case else re.IGNORECASE @@ -284,7 +284,7 @@ async def wait_for_text( except re.error as e: msg = f"Invalid regex pattern: {e}" await _maybe_log(ctx, level="warning", message=msg) - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e server = _get_server(socket_name=socket_name) pane = _resolve_pane( @@ -381,7 +381,7 @@ async def wait_for_text( "re-arm wait_for_text or use wait_for_channel for " "deterministic synchronization" ) - raise ToolError(msg) + raise ExpectedToolError(msg) # The shrink guard above catches clear-history and the # entry-at-cap rollover edge. It does NOT catch # grid_collect_history trim during continuous output, where @@ -495,7 +495,7 @@ async def wait_for_content_change( what the output will be — it waits for "something happened" rather than a specific pattern. - Raises ``ToolError`` when pane respawn or pane death invalidates the + Raises ``ExpectedToolError`` when pane respawn or pane death invalidates the baseline captured at entry. For correctness-sensitive flows prefer ``wait_for_channel`` composed with ``tmux wait-for -S``. diff --git a/src/libtmux_mcp/tools/server_tools.py b/src/libtmux_mcp/tools/server_tools.py index 7f160438..55a5e772 100644 --- a/src/libtmux_mcp/tools/server_tools.py +++ b/src/libtmux_mcp/tools/server_tools.py @@ -18,6 +18,7 @@ TAG_DESTRUCTIVE, TAG_MUTATING, TAG_READONLY, + ExpectedToolError, _apply_filters, _caller_is_on_server, _coerce_dict_arg, @@ -150,7 +151,7 @@ def kill_server(socket_name: str | None = None) -> str: "Refusing to kill the tmux server while this MCP server is running " "inside it. Use a manual tmux command if intended." ) - raise ToolError(msg) + raise ExpectedToolError(msg) server.kill() _invalidate_server(socket_name=socket_name) diff --git a/src/libtmux_mcp/tools/session_tools.py b/src/libtmux_mcp/tools/session_tools.py index ca841ab1..f2391abb 100644 --- a/src/libtmux_mcp/tools/session_tools.py +++ b/src/libtmux_mcp/tools/session_tools.py @@ -4,7 +4,6 @@ import typing as t -from fastmcp.exceptions import ToolError from libtmux.constants import WindowDirection from libtmux_mcp._utils import ( @@ -16,6 +15,7 @@ TAG_DESTRUCTIVE, TAG_MUTATING, TAG_READONLY, + ExpectedToolError, _apply_filters, _caller_is_on_server, _get_caller_identity, @@ -161,7 +161,7 @@ def create_window( if resolved is None: valid = ", ".join(sorted(direction_map)) msg = f"Invalid direction: {direction!r}. Valid: {valid}" - raise ToolError(msg) + raise ExpectedToolError(msg) kwargs["direction"] = resolved window = session.new_window(**kwargs) return _serialize_window(window) @@ -232,7 +232,7 @@ def kill_session( "Refusing to kill without an explicit target. " "Provide session_name or session_id." ) - raise ToolError(msg) + raise ExpectedToolError(msg) server = _get_server(socket_name=socket_name) session = _resolve_session(server, session_name=session_name, session_id=session_id) @@ -245,7 +245,7 @@ def kill_session( "Refusing to kill the session containing this MCP server's pane. " "Use a manual tmux command if intended." ) - raise ToolError(msg) + raise ExpectedToolError(msg) name = session.session_name or session.session_id session.kill() @@ -288,7 +288,7 @@ def select_window( """ if window_id is None and window_index is None and direction is None: msg = "Provide window_id, window_index, or direction." - raise ToolError(msg) + raise ExpectedToolError(msg) server = _get_server(socket_name=socket_name) @@ -319,7 +319,7 @@ def select_window( fn = _NAV.get(direction) if fn is None: msg = f"Invalid direction: {direction!r}. Valid: next, previous, last" - raise ToolError(msg) + raise ExpectedToolError(msg) active_window = fn() return _serialize_window(active_window) diff --git a/src/libtmux_mcp/tools/wait_for_tools.py b/src/libtmux_mcp/tools/wait_for_tools.py index cd7d0511..b486d271 100644 --- a/src/libtmux_mcp/tools/wait_for_tools.py +++ b/src/libtmux_mcp/tools/wait_for_tools.py @@ -35,11 +35,10 @@ import subprocess import typing as t -from fastmcp.exceptions import ToolError - from libtmux_mcp._utils import ( ANNOTATIONS_MUTATING, TAG_MUTATING, + ExpectedToolError, _get_server, _tmux_argv, handle_tool_errors_async, @@ -76,7 +75,7 @@ def _validate_channel_name(name: str) -> str: Raises ------ - ToolError + ExpectedToolError When ``name`` is empty, too long, or contains disallowed characters. @@ -91,15 +90,15 @@ def _validate_channel_name(name: str) -> str: >>> _validate_channel_name("has space") Traceback (most recent call last): ... - fastmcp.exceptions.ToolError: Invalid channel name: 'has space' + libtmux_mcp._utils.ExpectedToolError: Invalid channel name: 'has space' >>> _validate_channel_name("") Traceback (most recent call last): ... - fastmcp.exceptions.ToolError: Invalid channel name: '' + libtmux_mcp._utils.ExpectedToolError: Invalid channel name: '' """ if not _CHANNEL_NAME_RE.fullmatch(name): msg = f"Invalid channel name: {name!r}" - raise ToolError(msg) + raise ExpectedToolError(msg) return name @@ -146,7 +145,7 @@ async def wait_for_channel( Raises ------ - ToolError + ExpectedToolError On timeout, invalid channel name, or tmux error. """ server = _get_server(socket_name=socket_name) @@ -164,11 +163,11 @@ async def wait_for_channel( ) except subprocess.TimeoutExpired as e: msg = f"wait-for timeout: channel {cname!r} was not signalled within {timeout}s" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e except subprocess.CalledProcessError as e: stderr = e.stderr.decode(errors="replace").strip() if e.stderr else "" msg = f"wait-for failed for channel {cname!r}: {stderr or e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e return f"Channel {cname!r} was signalled" @@ -210,11 +209,11 @@ async def signal_channel( f"signal-channel timeout after {_SIGNAL_TIMEOUT_SECONDS}s: " f"channel {cname!r}" ) - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e except subprocess.CalledProcessError as e: stderr = e.stderr.decode(errors="replace").strip() if e.stderr else "" msg = f"signal-channel failed for channel {cname!r}: {stderr or e}" - raise ToolError(msg) from e + raise ExpectedToolError(msg) from e return f"Channel {cname!r} signalled" diff --git a/src/libtmux_mcp/tools/window_tools.py b/src/libtmux_mcp/tools/window_tools.py index e4cec2c1..7dcd6908 100644 --- a/src/libtmux_mcp/tools/window_tools.py +++ b/src/libtmux_mcp/tools/window_tools.py @@ -4,7 +4,6 @@ import typing as t -from fastmcp.exceptions import ToolError from libtmux.constants import PaneDirection from libtmux_mcp._utils import ( @@ -16,6 +15,7 @@ TAG_DESTRUCTIVE, TAG_MUTATING, TAG_READONLY, + ExpectedToolError, _apply_filters, _caller_is_on_server, _get_caller_identity, @@ -205,7 +205,7 @@ def split_window( if pane_dir is None: valid = ", ".join(sorted(_DIRECTION_MAP)) msg = f"Invalid direction: {direction!r}. Valid: {valid}" - raise ToolError(msg) + raise ExpectedToolError(msg) if pane_id is not None: pane = _resolve_pane(server, pane_id=pane_id) @@ -312,7 +312,7 @@ def kill_window( "Refusing to kill the window containing this MCP server's pane. " "Use a manual tmux command if intended." ) - raise ToolError(msg) + raise ExpectedToolError(msg) wid = window.window_id window.kill() diff --git a/tests/test_middleware.py b/tests/test_middleware.py index d5ebe360..9384c388 100644 --- a/tests/test_middleware.py +++ b/tests/test_middleware.py @@ -7,6 +7,7 @@ import logging import typing as t +import pydantic import pytest from fastmcp.server.middleware import MiddlewareContext from mcp.types import CallToolRequestParams @@ -355,7 +356,7 @@ def test_server_middleware_stack_order() -> None: The ordering is load-bearing (see server.py comment): TimingMiddleware must be outermost so it observes total wall time; AuditMiddleware must sit *outside* SafetyMiddleware so - tier-denial events (which raise ``ToolError`` before + tier-denial events (which raise ``ExpectedToolError`` before ``call_next``) are still recorded — without this ordering, forbidden-access attempts silently bypass the audit log. A refactor that swaps Audit and Safety would degrade @@ -366,7 +367,6 @@ def test_server_middleware_stack_order() -> None: calls are audited once each (Audit wraps the retry loop) and tier-denied tools never reach retry (Safety stops them first). """ - from fastmcp.server.middleware.error_handling import ErrorHandlingMiddleware from fastmcp.server.middleware.timing import TimingMiddleware from libtmux_mcp.middleware import ( @@ -374,6 +374,7 @@ def test_server_middleware_stack_order() -> None: ReadonlyRetryMiddleware, SafetyMiddleware, TailPreservingResponseLimitingMiddleware, + ToolErrorResultMiddleware, ) from libtmux_mcp.server import mcp @@ -384,7 +385,7 @@ def test_server_middleware_stack_order() -> None: assert types[:6] == [ TimingMiddleware, TailPreservingResponseLimitingMiddleware, - ErrorHandlingMiddleware, + ToolErrorResultMiddleware, AuditMiddleware, ReadonlyRetryMiddleware, SafetyMiddleware, @@ -392,12 +393,14 @@ def test_server_middleware_stack_order() -> None: def test_error_handling_middleware_transforms_errors() -> None: - """ErrorHandlingMiddleware is configured with transform_errors=True. - - Regression guard: without ``transform_errors=True`` the middleware - would still log but not map resource errors to MCP error code - ``-32002``, which is the protocol-correctness point of adopting - this middleware in the first place. + """ToolErrorResultMiddleware is configured with transform_errors=True. + + Regression guard: without ``transform_errors=True`` the inherited + ``on_message`` would still log but not map resource errors to MCP + error code ``-32002``, which is the protocol-correctness point of + keeping the ErrorHandlingMiddleware base for non-tool messages + (tool-call errors are converted to ``is_error`` results before + they reach ``on_message``). """ from fastmcp.server.middleware.error_handling import ErrorHandlingMiddleware @@ -417,12 +420,12 @@ def test_audit_records_safety_denial( Composes Audit and Safety in the production order (Audit outside Safety) by manually nesting their ``on_call_tool`` handlers: the inner ``call_next`` from Audit dispatches to Safety, which raises - ``ToolError`` for an over-tier tool. Audit should record that as - ``outcome=error error_type=ToolError`` rather than skipping the - record. Without this ordering, denied access attempts would - silently bypass forensic logging. + ``ExpectedToolError`` for an over-tier tool. Audit should record + that as ``outcome=error error_type=ExpectedToolError`` rather + than skipping the record. Without this ordering, denied access + attempts would silently bypass forensic logging. """ - from fastmcp.exceptions import ToolError + from libtmux_mcp._utils import ExpectedToolError audit = AuditMiddleware() ctx = _fake_context(name="kill_server", arguments={}) @@ -431,25 +434,25 @@ def test_audit_records_safety_denial( # context.fastmcp_context.fastmcp.get_tool(...). With # fastmcp_context=None the safety check short-circuits, so we # simulate the denial more directly: ``call_next`` is a coroutine - # that raises the same ``ToolError`` SafetyMiddleware would when - # blocking an over-tier call. The test's invariant is that the - # AuditMiddleware sitting *outside* Safety still records the - # attempt with outcome=error. + # that raises the same ``ExpectedToolError`` SafetyMiddleware + # would when blocking an over-tier call. The test's invariant is + # that the AuditMiddleware sitting *outside* Safety still records + # the attempt with outcome=error. msg = "Tool 'kill_server' is not available at the current safety level." async def _safety_denial(_ctx: t.Any) -> None: - raise ToolError(msg) + raise ExpectedToolError(msg) with ( caplog.at_level(logging.INFO, logger="libtmux_mcp.audit"), - pytest.raises(ToolError, match="not available"), + pytest.raises(ExpectedToolError, match="not available"), ): asyncio.run(audit.on_call_tool(ctx, _safety_denial)) rendered = "\n".join(rec.getMessage() for rec in caplog.records) assert "tool=kill_server" in rendered assert "outcome=error" in rendered - assert "error_type=ToolError" in rendered + assert "error_type=ExpectedToolError" in rendered # --------------------------------------------------------------------------- @@ -512,7 +515,7 @@ def test_readonly_retry_recovers_from_libtmux_exception() -> None: Models the production scenario the middleware exists to fix: a transient socket error from libtmux on the first call, then a successful call after the cache evicts the dead Server. Without - the retry the agent would see a ``ToolError`` on the first + the retry the agent would see an expected tool error on the first ``list_sessions``-style call. """ from libtmux import exc as libtmux_exc @@ -582,11 +585,11 @@ def test_readonly_retry_recovers_on_decorated_tool( Regression guard for the fastmcp <3.2.4 production no-op where ``RetryMiddleware._should_retry`` did not walk ``__cause__``. Every libtmux-mcp tool is wrapped by ``handle_tool_errors`` / - ``handle_tool_errors_async`` (``_utils.py:842-850``), which - converts ``LibTmuxException`` to ``ToolError(...) from - LibTmuxException``. At the middleware layer the exception type - is ``ToolError``, not ``LibTmuxException`` — so the retry - decision must walk ``__cause__`` to see the real failure type. + ``handle_tool_errors_async``, which converts ``LibTmuxException`` + to ``ExpectedToolError(...) from LibTmuxException``. At the + middleware layer the exception type is ``ExpectedToolError``, not + ``LibTmuxException`` — so the retry decision must walk + ``__cause__`` to see the real failure type. The unit tests above use ``_FlakyCallNext`` which raises ``LibTmuxException`` directly, bypassing the decorator. They @@ -647,3 +650,526 @@ def test_readonly_retry_logger_uses_project_namespace() -> None: """ middleware = ReadonlyRetryMiddleware() assert middleware._retry.logger.name == "libtmux_mcp.retry" + + +# --------------------------------------------------------------------------- +# ToolErrorResultMiddleware tests +# --------------------------------------------------------------------------- + + +def _error_probe_server() -> t.Any: + """Build a minimal FastMCP instance wired like the production stack. + + Only ``ToolErrorResultMiddleware`` is installed — the assertions + target error-result conversion, not the audit/retry/safety trio. + Tools mirror the three production raise shapes: an expected + failure with a chained cause (decorator-mapped libtmux error), an + expected failure with a recovery suggestion, and an unexpected + crash. + """ + from fastmcp import FastMCP + from libtmux import exc as libtmux_exc + + from libtmux_mcp._utils import ExpectedToolError, _map_exception_to_tool_error + from libtmux_mcp.middleware import ToolErrorResultMiddleware + + probe = FastMCP( + name="error-probe", + middleware=[ToolErrorResultMiddleware(transform_errors=True)], + ) + + @probe.tool + def fail_expected_chained() -> str: + # Mirror the ``handle_tool_errors`` decorator's mapping shape: + # ``raise from ``. + cause = libtmux_exc.PaneNotFound("%99") + mapped = _map_exception_to_tool_error("fail_expected_chained", cause) + raise mapped from cause + + @probe.tool + def fail_expected_with_suggestion() -> str: + msg = "Pane not found: %99" + raise ExpectedToolError( + msg, + suggestion="Call list_panes to discover valid pane ids.", + ) + + @probe.tool + def fail_unexpected() -> str: + msg = "boom" + raise RuntimeError(msg) + + @probe.tool + def takes_int(count: int) -> str: + # Never reached when the argument fails schema validation — + # fastmcp raises pydantic.ValidationError before tool code runs. + return str(count) + + return probe + + +def test_tool_error_result_expected_failure_is_clean() -> None: + """Expected failures surface verbatim — no transform-layer prefix. + + Regression guard for the stock ``ErrorHandlingMiddleware`` + behavior this middleware replaces: its ``-32603`` catch-all + rewrote every tool failure to ``"Internal error: "``, + mangling the agent-facing text. + """ + from fastmcp import Client + + probe = _error_probe_server() + + async def _call() -> t.Any: + async with Client(probe) as client: + return await client.call_tool("fail_expected_chained", raise_on_error=False) + + result = asyncio.run(_call()) + + assert result.is_error is True + text = result.content[0].text + assert text.startswith("Pane not found:") + assert "Internal error" not in text + + +def test_tool_error_result_meta_carries_error_details() -> None: + """``meta`` reports the originating exception class and expectedness. + + ``error_type`` names the ``__cause__`` when the raise site chained + one — agents see ``PaneNotFound``, not the mapped + ``ExpectedToolError`` wrapper. + """ + from fastmcp import Client + + probe = _error_probe_server() + + async def _call(name: str) -> t.Any: + async with Client(probe) as client: + return await client.call_tool(name, raise_on_error=False) + + chained = asyncio.run(_call("fail_expected_chained")) + assert chained.is_error is True + meta = chained.meta or {} + assert meta["error_type"] == "PaneNotFound" + assert meta["expected"] is True + + crashed = asyncio.run(_call("fail_unexpected")) + assert crashed.is_error is True + crash_meta = crashed.meta or {} + assert crash_meta["error_type"] == "RuntimeError" + assert crash_meta["expected"] is False + + +def test_tool_error_result_appends_suggestion() -> None: + """A ``suggestion`` lands in both the text block and ``meta``.""" + from fastmcp import Client + + probe = _error_probe_server() + + async def _call() -> t.Any: + async with Client(probe) as client: + return await client.call_tool( + "fail_expected_with_suggestion", raise_on_error=False + ) + + result = asyncio.run(_call()) + + assert result.is_error is True + text = result.content[0].text + assert text == ("Pane not found: %99\nCall list_panes to discover valid pane ids.") + meta = getattr(result, "meta", None) or {} + assert meta["suggestion"] == "Call list_panes to discover valid pane ids." + + +class ErrorLogLevelFixture(t.NamedTuple): + """Test fixture for ToolErrorResultMiddleware._log_error levels.""" + + test_id: str + tool_name: str + arguments: dict[str, t.Any] | None + expected_level: int + message_fragment: str + + +ERROR_LOG_LEVEL_FIXTURES: list[ErrorLogLevelFixture] = [ + ErrorLogLevelFixture( + test_id="expected_failure_logs_warning", + tool_name="fail_expected_chained", + arguments=None, + expected_level=logging.WARNING, + message_fragment="Pane not found", + ), + ErrorLogLevelFixture( + test_id="unexpected_failure_logs_error", + tool_name="fail_unexpected", + arguments=None, + expected_level=logging.ERROR, + message_fragment="boom", + ), + ErrorLogLevelFixture( + test_id="schema_validation_logs_warning", + tool_name="takes_int", + arguments={"count": "not-an-int"}, + expected_level=logging.WARNING, + message_fragment="validation error", + ), +] + + +@pytest.mark.parametrize( + ErrorLogLevelFixture._fields, + ERROR_LOG_LEVEL_FIXTURES, + ids=[f.test_id for f in ERROR_LOG_LEVEL_FIXTURES], +) +def test_tool_error_result_logs_at_error_log_level( + test_id: str, + tool_name: str, + arguments: dict[str, t.Any] | None, + expected_level: int, + message_fragment: str, + caplog: pytest.LogCaptureFixture, + monkeypatch: pytest.MonkeyPatch, +) -> None: + """``_log_error`` honors ``log_level``: WARNING expected, ERROR not. + + The stock ``ErrorHandlingMiddleware._log_error`` hardcodes + ``logger.error`` — without the override, every failure demoted to + WARNING by ``ExpectedToolError`` would be re-shouted at ERROR on + the ``fastmcp.errors`` channel. Argument-schema validation + failures carry no ``log_level`` at all; the middleware classifies + them as agent-correctable WARNINGs. + + Assertions go through ``record.getMessage()`` so the lazy + %-formatting args are interpolated regardless of whether a handler + formatted the record — and a literal ``%s`` leaking into the + message would fail the fragment match. + """ + from fastmcp import Client + + # fastmcp's logger doesn't propagate to root (rich handler); + # re-enable propagation so caplog sees the records. + monkeypatch.setattr(logging.getLogger("fastmcp"), "propagate", True) + + probe = _error_probe_server() + + async def _call() -> None: + async with Client(probe) as client: + await client.call_tool(tool_name, arguments, raise_on_error=False) + + with caplog.at_level(logging.DEBUG, logger="fastmcp.errors"): + asyncio.run(_call()) + + levels = [ + r.levelno + for r in caplog.records + if r.name == "fastmcp.errors" + and "Error in tools/call" in r.getMessage() + and message_fragment in r.getMessage() + ] + assert levels == [expected_level] + + +def test_schema_validation_failure_marked_expected_in_meta() -> None: + """Argument-schema failures report ``expected=True`` in result meta. + + fastmcp validates arguments before tool code runs, so the + ``handle_tool_errors`` decorators never get the chance to classify + these as ``ExpectedToolError`` — the middleware must recognize the + bare ``pydantic.ValidationError`` itself. Bad arguments are + agent-correctable, same as a bad pane id. + """ + from fastmcp import Client + + probe = _error_probe_server() + + async def _call() -> t.Any: + async with Client(probe) as client: + return await client.call_tool( + "takes_int", {"count": "not-an-int"}, raise_on_error=False + ) + + result = asyncio.run(_call()) + + assert result.is_error is True + meta = result.meta or {} + assert meta["expected"] is True + assert meta["error_type"] == "ValidationError" + + +class LimiterErrorFixture(t.NamedTuple): + """Test fixture for error-result preservation through the limiter.""" + + test_id: str + tool_name: str + arguments: dict[str, t.Any] | None + expect_is_error: bool + expect_header: bool + text_must_contain: str | None + + +LIMITER_ERROR_FIXTURES: list[LimiterErrorFixture] = [ + LimiterErrorFixture( + test_id="oversized_error_keeps_flag", + tool_name="limited_fail", + arguments={"payload": "x" * 5000}, + expect_is_error=True, + expect_header=True, + # Tail-preservation keeps the suggestion line at the end. + text_must_contain="Call list_panes to discover valid pane ids.", + ), + LimiterErrorFixture( + test_id="small_error_untouched", + tool_name="limited_fail", + arguments={"payload": "x"}, + expect_is_error=True, + expect_header=False, + text_must_contain="Invalid name: 'x'", + ), + LimiterErrorFixture( + test_id="oversized_success_not_marked_error", + tool_name="limited_ok", + arguments=None, + expect_is_error=False, + expect_header=True, + text_must_contain=None, + ), +] + + +class _LimiterOut(pydantic.BaseModel): + """Output model giving the ``limited_fail`` probe an output schema. + + Module-level (not test-local) because ``from __future__ import + annotations`` stringifies the return annotation and pydantic + resolves it against module globals when fastmcp builds the schema. + """ + + value: str + + +def _limiter_probe_server() -> t.Any: + """Build a FastMCP instance with the limiter wrapping error conversion. + + Mirrors the production ordering (limiter outside + ``ToolErrorResultMiddleware``) with a small ``max_size`` so the + truncation path fires. ``limited_fail`` returns a Pydantic model + so the MCP SDK client's output-schema validation is in play. + """ + from fastmcp import FastMCP + + from libtmux_mcp._utils import ExpectedToolError + from libtmux_mcp.middleware import ( + TailPreservingResponseLimitingMiddleware, + ToolErrorResultMiddleware, + ) + + probe = FastMCP( + name="limiter-probe", + middleware=[ + TailPreservingResponseLimitingMiddleware( + max_size=300, + tools=["limited_fail", "limited_ok"], + ), + ToolErrorResultMiddleware(transform_errors=True), + ], + ) + + @probe.tool + def limited_fail(payload: str) -> _LimiterOut: + msg = f"Invalid name: {payload!r}" + raise ExpectedToolError( + msg, + suggestion="Call list_panes to discover valid pane ids.", + ) + + # output_schema=None: fastmcp wraps even plain-str returns in a + # result schema, and the stock truncation path drops structured + # content — the MCP SDK client then rejects ANY truncated success + # from a schema'd tool. That pre-existing upstream gap is not what + # this probe tests; disable the schema so the success case + # exercises only the is_error handling. + @probe.tool(output_schema=None) + def limited_ok() -> str: + return "y" * 5000 + + return probe + + +@pytest.mark.parametrize( + LimiterErrorFixture._fields, + LIMITER_ERROR_FIXTURES, + ids=[f.test_id for f in LIMITER_ERROR_FIXTURES], +) +def test_response_limiter_preserves_error_results( + test_id: str, + tool_name: str, + arguments: dict[str, t.Any] | None, + expect_is_error: bool, + expect_header: bool, + text_must_contain: str | None, +) -> None: + """Truncation keeps ``is_error``; successes never gain it. + + Regression guard: the stock truncation path rebuilds the result + without the error flag, so an oversized error from a tool with an + output schema became an apparent success — and the MCP SDK client + raised ``RuntimeError: ... has an output schema but did not return + structured content`` instead of delivering the tool error. The + ``limited_fail`` probe returns a Pydantic model precisely so this + test exercises that client-side validation path. + """ + from fastmcp import Client + + probe = _limiter_probe_server() + + async def _call() -> t.Any: + async with Client(probe) as client: + return await client.call_tool(tool_name, arguments, raise_on_error=False) + + result = asyncio.run(_call()) + + assert result.is_error is expect_is_error + text = result.content[0].text + assert ("[... truncated" in text) is expect_header + if text_must_contain is not None: + assert text_must_contain in text + if expect_is_error: + meta = result.meta or {} + assert meta["expected"] is True + assert meta["error_type"] == "ExpectedToolError" + + +class UnknownArgSuggestionFixture(t.NamedTuple): + """Test fixture for synthesized unexpected-argument suggestions.""" + + test_id: str + arguments: dict[str, t.Any] + client_name: str | None + expect_suggestion: bool + contains: tuple[str, ...] + not_contains: tuple[str, ...] + + +UNKNOWN_ARG_SUGGESTION_FIXTURES: list[UnknownArgSuggestionFixture] = [ + UnknownArgSuggestionFixture( + test_id="scheduling_flag_names_client", + arguments={"count": 1, "wait_for_previous": True}, + client_name="gemini-test", + expect_suggestion=True, + contains=( + "Remove or correct the unrecognized argument(s): wait_for_previous.", + "your client (gemini-test 9.9)", + "retry the call without it", + ), + not_contains=("some clients",), + ), + UnknownArgSuggestionFixture( + test_id="scheduling_flag_default_client", + arguments={"count": 1, "wait_for_previous": True}, + client_name=None, + expect_suggestion=True, + # The in-memory fastmcp Client always sends clientInfo, so the + # handshake-derived label is used; the no-handshake generic + # wording is covered by the unit test below. + contains=("wait_for_previous", "your client ("), + not_contains=("gemini-test",), + ), + UnknownArgSuggestionFixture( + test_id="typo_lists_names_without_client_note", + arguments={"count": 1, "bogus_flag": True}, + client_name=None, + expect_suggestion=True, + contains=("Remove or correct the unrecognized argument(s): bogus_flag.",), + not_contains=("scheduling flag", "Gemini"), + ), + UnknownArgSuggestionFixture( + test_id="missing_required_arg_no_suggestion", + arguments={}, + client_name=None, + expect_suggestion=False, + contains=(), + not_contains=(), + ), +] + + +@pytest.mark.parametrize( + UnknownArgSuggestionFixture._fields, + UNKNOWN_ARG_SUGGESTION_FIXTURES, + ids=[f.test_id for f in UNKNOWN_ARG_SUGGESTION_FIXTURES], +) +def test_unexpected_argument_suggestion( + test_id: str, + arguments: dict[str, t.Any], + client_name: str | None, + expect_suggestion: bool, + contains: tuple[str, ...], + not_contains: tuple[str, ...], +) -> None: + """Schema rejections of unexpected arguments carry a recovery hint. + + The rejection itself is unchanged — the argument is never silently + stripped (contrast MemPalace/mempalace#322's pop-the-key approach) — + but the suggestion names exactly which argument(s) to drop or fix, + and flags ``wait_for_previous`` as a client scheduling leak, naming + the client from the MCP initialize handshake when available. + """ + import mcp.types + from fastmcp import Client + + probe = _error_probe_server() + + client_kwargs: dict[str, t.Any] = {} + if client_name is not None: + client_kwargs["client_info"] = mcp.types.Implementation( + name=client_name, version="9.9" + ) + + async def _call() -> t.Any: + async with Client(probe, **client_kwargs) as client: + return await client.call_tool("takes_int", arguments, raise_on_error=False) + + result = asyncio.run(_call()) + + assert result.is_error is True + meta = result.meta or {} + assert meta["expected"] is True + if not expect_suggestion: + assert "suggestion" not in meta + return + suggestion = meta["suggestion"] + text = result.content[0].text + assert suggestion in text # appended to the agent-visible message + for fragment in contains: + assert fragment in suggestion + for fragment in not_contains: + assert fragment not in suggestion + + +def test_unexpected_argument_suggestion_without_handshake() -> None: + """No client handshake -> the scheduling-flag note uses generic wording. + + Real connections always carry ``clientInfo`` (required by the MCP + initialize handshake), so the generic branch is reachable only + where no context exists — exercised here by calling + ``_error_tool_result`` directly with a manufactured validation + error, the same shape fastmcp raises for tool arguments. + """ + import pydantic + + from libtmux_mcp.middleware import _error_tool_result + + @pydantic.validate_call + def _probe_fn(count: int) -> int: + return count # pragma: no cover - validation rejects the call + + try: + _probe_fn(count=1, wait_for_previous=True) # type: ignore[call-arg] + except pydantic.ValidationError as exc: + result = _error_tool_result(exc, None) + else: # pragma: no cover - defends the test's own premise + pytest.fail("expected a ValidationError") + + meta = result.meta or {} + assert meta["expected"] is True + assert "some clients (e.g. Gemini CLI)" in meta["suggestion"] diff --git a/tests/test_pane_tools.py b/tests/test_pane_tools.py index 793d6a30..2b925ed8 100644 --- a/tests/test_pane_tools.py +++ b/tests/test_pane_tools.py @@ -2906,12 +2906,10 @@ def test_snapshot_pane_truncates_content(mcp_server: Server, mcp_pane: Pane) -> ``content_truncated`` and ``content_truncated_lines``. ``content`` itself is the kept tail with no marker. """ - for i in range(20): - mcp_pane.send_keys(f"echo snap_line_{i}", enter=True) - retry_until( - lambda: "snap_line_19" in "\n".join(mcp_pane.capture_pane()), - 2, - raises=True, + _signal_after_shell_payload( + mcp_server, + mcp_pane, + "; ".join(f"echo snap_line_{i}" for i in range(20)), ) result = snapshot_pane( @@ -2930,12 +2928,10 @@ def test_snapshot_pane_max_lines_none_keeps_full_content( mcp_server: Server, mcp_pane: Pane ) -> None: """``max_lines=None`` returns the full content with no truncation flag.""" - for i in range(20): - mcp_pane.send_keys(f"echo snapnone_{i}", enter=True) - retry_until( - lambda: "snapnone_19" in "\n".join(mcp_pane.capture_pane()), - 2, - raises=True, + _signal_after_shell_payload( + mcp_server, + mcp_pane, + "; ".join(f"echo snapnone_{i}" for i in range(20)), ) result = snapshot_pane( diff --git a/tests/test_utils.py b/tests/test_utils.py index 7c9db4e6..3b25f694 100644 --- a/tests/test_utils.py +++ b/tests/test_utils.py @@ -862,3 +862,126 @@ async def _raiser() -> None: with pytest.raises(ToolError, match=r"Unexpected error: RuntimeError: boom"): asyncio.run(_raiser()) + + +# --------------------------------------------------------------------------- +# ExpectedToolError log-level tests +# --------------------------------------------------------------------------- + + +@pytest.mark.parametrize( + "raised", + [ + exc.TmuxSessionExists("session foo already exists"), + exc.BadSessionName("bad name"), + exc.TmuxObjectDoesNotExist("@99"), + exc.PaneNotFound("%99"), + exc.LibTmuxException("server gone"), + ], + ids=lambda e: type(e).__name__, +) +def test_map_exception_expected_failures_log_at_warning( + raised: Exception, +) -> None: + """Agent-correctable libtmux failures map to WARNING-level errors.""" + import logging + + from libtmux_mcp._utils import ExpectedToolError, _map_exception_to_tool_error + + mapped = _map_exception_to_tool_error("some_tool", raised) + assert isinstance(mapped, ExpectedToolError) + assert mapped.log_level == logging.WARNING + + +@pytest.mark.parametrize( + "raised", + [ + exc.TmuxCommandNotFound("tmux missing"), + RuntimeError("boom"), + ], + ids=lambda e: type(e).__name__, +) +def test_map_exception_operator_faults_stay_at_error(raised: Exception) -> None: + """Environment faults and unexpected bugs keep the ERROR default.""" + import logging + + from libtmux_mcp._utils import ExpectedToolError, _map_exception_to_tool_error + + mapped = _map_exception_to_tool_error("some_tool", raised) + assert not isinstance(mapped, ExpectedToolError) + assert mapped.log_level == logging.ERROR + + +def test_expected_tool_error_logs_warning_through_server( + caplog: pytest.LogCaptureFixture, + monkeypatch: pytest.MonkeyPatch, +) -> None: + """Fastmcp's server-layer error log honors ``ExpectedToolError.log_level``. + + Uses a minimal FastMCP instance (no middleware stack) so the + assertion isolates fastmcp's own ``Error calling tool`` record — + the project middleware's log behavior is covered in + ``test_middleware.py``. + + fastmcp sets ``propagate = False`` on its logger (it installs a + rich handler instead), which hides records from caplog's + root-logger handler — re-enable propagation for the test. + """ + import asyncio + import logging + + from fastmcp import Client, FastMCP + + from libtmux_mcp._utils import ExpectedToolError + + monkeypatch.setattr(logging.getLogger("fastmcp"), "propagate", True) + + probe = FastMCP(name="probe") + + @probe.tool + def fail_expected() -> str: + msg = "Pane not found: %99" + raise ExpectedToolError(msg) + + async def _call() -> None: + async with Client(probe) as client: + await client.call_tool("fail_expected", raise_on_error=False) + + with caplog.at_level(logging.DEBUG): + asyncio.run(_call()) + + records = [r for r in caplog.records if "Error calling tool" in r.message] + assert records, "expected fastmcp to log the tool failure" + assert all(r.levelno == logging.WARNING for r in records) + + +@pytest.mark.parametrize( + ("raised", "expected_suggestion_fragment"), + [ + (exc.TmuxObjectDoesNotExist("@99"), "list_sessions / list_windows"), + (exc.PaneNotFound("%99"), "list_panes"), + (exc.TmuxSessionExists("dup"), None), + (exc.BadSessionName("bad:name"), None), + (exc.LibTmuxException("transient"), None), + ], + ids=lambda v: type(v).__name__ if isinstance(v, Exception) else str(v), +) +def test_map_exception_suggestion_policy( + raised: Exception, + expected_suggestion_fragment: str | None, +) -> None: + """Only the not-found branches carry agent-facing recovery hints. + + Discovery tools are the canonical fix for stale/guessed ids — the + most common agent mistake. The other expected branches stay + hint-free until real transcripts show agents flailing on them. + """ + from libtmux_mcp._utils import _map_exception_to_tool_error + + mapped = _map_exception_to_tool_error("some_tool", raised) + suggestion = getattr(mapped, "suggestion", None) + if expected_suggestion_fragment is None: + assert suggestion is None + else: + assert suggestion is not None + assert expected_suggestion_fragment in suggestion diff --git a/uv.lock b/uv.lock index e9479016..854e5cf4 100644 --- a/uv.lock +++ b/uv.lock @@ -1240,7 +1240,7 @@ testing = [ [package.metadata] requires-dist = [ - { name = "fastmcp", specifier = ">=3.2.4,<4.0.0" }, + { name = "fastmcp", specifier = ">=3.4.0,<4.0.0" }, { name = "libtmux", specifier = ">=0.58.0,<1.0" }, ]