|
| 1 | +# PRD: P2-T4 — Surface broker unavailability as JSON-RPC error instead of silent timeout |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +When `BrokerProxy` cannot connect to the broker (timeout, spawn failure, daemon unavailable), |
| 6 | +the client currently receives no response and eventually times out — showing "0 tools" or a |
| 7 | +generic connection error with no actionable message. This task fixes the proxy to return a |
| 8 | +well-formed JSON-RPC error response so MCP clients can surface a meaningful error. |
| 9 | + |
| 10 | +## Problem Statement |
| 11 | + |
| 12 | +`BrokerProxy.run()` calls `_spawn_broker_if_needed()` and then `_connect_with_timeout()`. |
| 13 | +Both may raise `TimeoutError` or `OSError`. These exceptions currently propagate uncaught, |
| 14 | +causing the proxy process to exit. The client's stdout pipe reaches EOF, but no JSON-RPC |
| 15 | +response is ever written — the client hangs indefinitely or shows a confusing "0 tools" state. |
| 16 | + |
| 17 | +## Proposed Solution |
| 18 | + |
| 19 | +Wrap the connect phase in `run()` with a try/except. On any connection failure: |
| 20 | + |
| 21 | +1. Log the error. |
| 22 | +2. Write a JSON-RPC 2.0 error response to stdout (before exiting). |
| 23 | +3. Return cleanly (no re-raise). |
| 24 | + |
| 25 | +### Error response format |
| 26 | + |
| 27 | +```json |
| 28 | +{ |
| 29 | + "jsonrpc": "2.0", |
| 30 | + "id": null, |
| 31 | + "error": { |
| 32 | + "code": -32001, |
| 33 | + "message": "Broker unavailable: <reason>" |
| 34 | + } |
| 35 | +} |
| 36 | +``` |
| 37 | + |
| 38 | +`id` is `null` because we cannot reliably read the pending request from stdin during the error |
| 39 | +path (the request may not have arrived yet, and reading stdin would block or require an |
| 40 | +additional async task). JSON-RPC 2.0 §5 permits `null` for the response id when the request |
| 41 | +id cannot be determined. |
| 42 | + |
| 43 | +### Scope boundary |
| 44 | + |
| 45 | +This task covers **connection-phase** failures only (before the bridge starts running). It does |
| 46 | +NOT cover mid-session broker crashes (daemon dies while `_run_bridge` is active); that is a |
| 47 | +separate concern. |
| 48 | + |
| 49 | +## Deliverables |
| 50 | + |
| 51 | +| File | Change | |
| 52 | +|------|--------| |
| 53 | +| `src/mcpbridge_wrapper/broker/proxy.py` | Add `_send_broker_error()` helper; wrap connect phase in `run()` with try/except | |
| 54 | +| `tests/unit/test_broker_proxy.py` | Add `TestBrokerProxyUnavailableError` with ≥4 tests | |
| 55 | + |
| 56 | +## Implementation Plan |
| 57 | + |
| 58 | +### 1. `proxy.py` — add `_send_broker_error()` |
| 59 | + |
| 60 | +New private async method: |
| 61 | + |
| 62 | +```python |
| 63 | +async def _send_broker_error(self, reason: str) -> None: |
| 64 | + """Write a JSON-RPC -32001 error to stdout and flush.""" |
| 65 | + import json |
| 66 | + payload = json.dumps({ |
| 67 | + "jsonrpc": "2.0", |
| 68 | + "id": None, |
| 69 | + "error": {"code": -32001, "message": f"Broker unavailable: {reason}"}, |
| 70 | + }) + "\n" |
| 71 | + writer = self._stdout |
| 72 | + if writer is None: |
| 73 | + writer = await self._make_stdout_writer() |
| 74 | + writer.write(payload.encode()) |
| 75 | + try: |
| 76 | + await writer.drain() |
| 77 | + except Exception: |
| 78 | + pass |
| 79 | +``` |
| 80 | + |
| 81 | +### 2. `proxy.py` — modify `run()` |
| 82 | + |
| 83 | +Wrap the connect phase: |
| 84 | + |
| 85 | +```python |
| 86 | +async def run(self) -> None: |
| 87 | + try: |
| 88 | + if self._auto_spawn: |
| 89 | + await self._spawn_broker_if_needed() |
| 90 | + sock_reader, sock_writer = await self._connect_with_timeout() |
| 91 | + except Exception as exc: |
| 92 | + reason = str(exc) |
| 93 | + logger.error("Broker unavailable: %s", reason) |
| 94 | + await self._send_broker_error(reason) |
| 95 | + return |
| 96 | + # ... rest unchanged ... |
| 97 | +``` |
| 98 | + |
| 99 | +### 3. `test_broker_proxy.py` — add `TestBrokerProxyUnavailableError` |
| 100 | + |
| 101 | +Tests: |
| 102 | +- `test_connect_timeout_sends_jsonrpc_error` — TimeoutError from `_connect_with_timeout` → error written to stdout writer |
| 103 | +- `test_error_code_is_minus_32001` — error code in payload is -32001 |
| 104 | +- `test_error_message_includes_reason` — `"Broker unavailable:"` prefix present in message |
| 105 | +- `test_run_does_not_raise_on_connect_failure` — `run()` returns without re-raising on TimeoutError |
| 106 | +- `test_spawn_failure_sends_jsonrpc_error` — TimeoutError from `_spawn_broker_if_needed` → error written |
| 107 | + |
| 108 | +## Acceptance Criteria |
| 109 | + |
| 110 | +- [ ] Connection timeout produces a JSON-RPC `-32001` error response written to stdout |
| 111 | +- [ ] Error message includes a human-readable reason (timeout, refused, stale socket) |
| 112 | +- [ ] `run()` returns without re-raising — client does not hang indefinitely |
| 113 | +- [ ] All existing broker tests pass |
| 114 | +- [ ] `pytest --cov` coverage ≥ 90% |
| 115 | +- [ ] `ruff check src/` passes |
| 116 | +- [ ] `ruff format --check src/ tests/` passes |
| 117 | + |
| 118 | +## Dependencies |
| 119 | + |
| 120 | +- None (P2-T2 already handles stale socket recovery in spawn; this task is a pure error-surface improvement) |
| 121 | + |
| 122 | +## Risk |
| 123 | + |
| 124 | +Low. The change is additive — existing happy path is unchanged. The error path only activates |
| 125 | +when connection already fails. |
0 commit comments