Skip to content

Commit 45b106a

Browse files
authored
Merge pull request #127 from SoundBlaster/feature/P2-T4-broker-unavailable-error
P2-T4: Surface broker unavailability as JSON-RPC error instead of silent timeout
2 parents 89bd99c + 727e04b commit 45b106a

9 files changed

Lines changed: 450 additions & 20 deletions

File tree

SPECS/ARCHIVE/INDEX.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# mcpbridge-wrapper Tasks Archive
22

3-
**Last Updated:** 2026-03-01 (P2-T3 archived)
3+
**Last Updated:** 2026-03-01 (P2-T4 archived)
44

55
## Archived Tasks
66

77
| Task ID | Folder | Archived | Verdict |
88
|---------|--------|----------|---------|
9+
| P2-T4 | [P2-T4_Surface_broker_unavailability_as_JSONRPC_error/](P2-T4_Surface_broker_unavailability_as_JSONRPC_error/) | 2026-03-01 | PASS |
910
| P2-T3 | [P2-T3_Fix_double_spawn_race_condition/](P2-T3_Fix_double_spawn_race_condition/) | 2026-03-01 | PASS |
1011
| P2-T1 | [P2-T1_Replace_broker_flags_with_single_broker_flag/](P2-T1_Replace_broker_flags_with_single_broker_flag/) | 2026-03-01 | PASS |
1112
| P2-T2 | [P2-T2_Self-healing_stale_socket_and_PID_file_recovery/](P2-T2_Self-healing_stale_socket_and_PID_file_recovery/) | 2026-03-01 | PASS |
@@ -174,6 +175,7 @@
174175

175176
| File | Description |
176177
|------|-------------|
178+
| [REVIEW_P2-T4_broker_unavailable_error.md](_Historical/REVIEW_P2-T4_broker_unavailable_error.md) | Review report for P2-T4 |
177179
| [REVIEW_P2-T3_spawn_lock.md](_Historical/REVIEW_P2-T3_spawn_lock.md) | Review report for P2-T3 |
178180
| [REVIEW_P2-T1_broker_flag.md](_Historical/REVIEW_P2-T1_broker_flag.md) | Review report for P2-T1 |
179181
| [REVIEW_P2-T2_stale_socket_recovery.md](_Historical/REVIEW_P2-T2_stale_socket_recovery.md) | Review report for P2-T2 |
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# PRD: P2-T4 — Surface broker unavailability as JSON-RPC error instead of silent timeout
2+
3+
## Overview
4+
5+
When `BrokerProxy` cannot connect to the broker (timeout, spawn failure, daemon unavailable),
6+
the client currently receives no response and eventually times out — showing "0 tools" or a
7+
generic connection error with no actionable message. This task fixes the proxy to return a
8+
well-formed JSON-RPC error response so MCP clients can surface a meaningful error.
9+
10+
## Problem Statement
11+
12+
`BrokerProxy.run()` calls `_spawn_broker_if_needed()` and then `_connect_with_timeout()`.
13+
Both may raise `TimeoutError` or `OSError`. These exceptions currently propagate uncaught,
14+
causing the proxy process to exit. The client's stdout pipe reaches EOF, but no JSON-RPC
15+
response is ever written — the client hangs indefinitely or shows a confusing "0 tools" state.
16+
17+
## Proposed Solution
18+
19+
Wrap the connect phase in `run()` with a try/except. On any connection failure:
20+
21+
1. Log the error.
22+
2. Write a JSON-RPC 2.0 error response to stdout (before exiting).
23+
3. Return cleanly (no re-raise).
24+
25+
### Error response format
26+
27+
```json
28+
{
29+
"jsonrpc": "2.0",
30+
"id": null,
31+
"error": {
32+
"code": -32001,
33+
"message": "Broker unavailable: <reason>"
34+
}
35+
}
36+
```
37+
38+
`id` is `null` because we cannot reliably read the pending request from stdin during the error
39+
path (the request may not have arrived yet, and reading stdin would block or require an
40+
additional async task). JSON-RPC 2.0 §5 permits `null` for the response id when the request
41+
id cannot be determined.
42+
43+
### Scope boundary
44+
45+
This task covers **connection-phase** failures only (before the bridge starts running). It does
46+
NOT cover mid-session broker crashes (daemon dies while `_run_bridge` is active); that is a
47+
separate concern.
48+
49+
## Deliverables
50+
51+
| File | Change |
52+
|------|--------|
53+
| `src/mcpbridge_wrapper/broker/proxy.py` | Add `_send_broker_error()` helper; wrap connect phase in `run()` with try/except |
54+
| `tests/unit/test_broker_proxy.py` | Add `TestBrokerProxyUnavailableError` with ≥4 tests |
55+
56+
## Implementation Plan
57+
58+
### 1. `proxy.py` — add `_send_broker_error()`
59+
60+
New private async method:
61+
62+
```python
63+
async def _send_broker_error(self, reason: str) -> None:
64+
"""Write a JSON-RPC -32001 error to stdout and flush."""
65+
import json
66+
payload = json.dumps({
67+
"jsonrpc": "2.0",
68+
"id": None,
69+
"error": {"code": -32001, "message": f"Broker unavailable: {reason}"},
70+
}) + "\n"
71+
writer = self._stdout
72+
if writer is None:
73+
writer = await self._make_stdout_writer()
74+
writer.write(payload.encode())
75+
try:
76+
await writer.drain()
77+
except Exception:
78+
pass
79+
```
80+
81+
### 2. `proxy.py` — modify `run()`
82+
83+
Wrap the connect phase:
84+
85+
```python
86+
async def run(self) -> None:
87+
try:
88+
if self._auto_spawn:
89+
await self._spawn_broker_if_needed()
90+
sock_reader, sock_writer = await self._connect_with_timeout()
91+
except Exception as exc:
92+
reason = str(exc)
93+
logger.error("Broker unavailable: %s", reason)
94+
await self._send_broker_error(reason)
95+
return
96+
# ... rest unchanged ...
97+
```
98+
99+
### 3. `test_broker_proxy.py` — add `TestBrokerProxyUnavailableError`
100+
101+
Tests:
102+
- `test_connect_timeout_sends_jsonrpc_error` — TimeoutError from `_connect_with_timeout` → error written to stdout writer
103+
- `test_error_code_is_minus_32001` — error code in payload is -32001
104+
- `test_error_message_includes_reason``"Broker unavailable:"` prefix present in message
105+
- `test_run_does_not_raise_on_connect_failure``run()` returns without re-raising on TimeoutError
106+
- `test_spawn_failure_sends_jsonrpc_error` — TimeoutError from `_spawn_broker_if_needed` → error written
107+
108+
## Acceptance Criteria
109+
110+
- [ ] Connection timeout produces a JSON-RPC `-32001` error response written to stdout
111+
- [ ] Error message includes a human-readable reason (timeout, refused, stale socket)
112+
- [ ] `run()` returns without re-raising — client does not hang indefinitely
113+
- [ ] All existing broker tests pass
114+
- [ ] `pytest --cov` coverage ≥ 90%
115+
- [ ] `ruff check src/` passes
116+
- [ ] `ruff format --check src/ tests/` passes
117+
118+
## Dependencies
119+
120+
- None (P2-T2 already handles stale socket recovery in spawn; this task is a pure error-surface improvement)
121+
122+
## Risk
123+
124+
Low. The change is additive — existing happy path is unchanged. The error path only activates
125+
when connection already fails.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Validation Report: P2-T4 — Surface broker unavailability as JSON-RPC error
2+
3+
**Date:** 2026-03-01
4+
**Status:** PASS
5+
6+
## Changes Delivered
7+
8+
| File | Change |
9+
|------|--------|
10+
| `src/mcpbridge_wrapper/broker/proxy.py` | Added `import json`; added `_send_broker_error()` helper; wrapped connect phase in `run()` with try/except |
11+
| `tests/unit/test_broker_proxy.py` | Updated `test_returns_with_error_when_no_socket`; added `TestBrokerProxyUnavailableError` (5 tests) |
12+
| `tests/unit/test_broker_stubs.py` | Updated `test_run_raises_timeout_when_no_socket``test_run_writes_error_when_no_socket` to match new behaviour |
13+
14+
## Acceptance Criteria
15+
16+
- [x] Connection timeout produces a JSON-RPC `-32001` error response written to stdout
17+
- [x] Error message includes a human-readable reason (`"Broker unavailable: <reason>"`)
18+
- [x] `run()` returns without re-raising — client does not hang indefinitely
19+
- [x] All existing broker tests pass (updated 2 tests whose behaviour legitimately changed)
20+
- [x] `pytest --cov` coverage ≥ 90% (achieved 91.59%)
21+
- [x] `ruff check src/` passes
22+
- [x] `ruff format --check src/ tests/` passes
23+
24+
## Quality Gates
25+
26+
| Gate | Result |
27+
|------|--------|
28+
| `pytest -q` | 732 passed, 5 skipped |
29+
| `pytest --cov` coverage | 91.59% (≥ 90% required) |
30+
| `ruff check src/` | PASS |
31+
| `ruff format --check src/ tests/` | PASS |
32+
33+
## Implementation Notes
34+
35+
- `_send_broker_error(reason)` constructs a JSON-RPC 2.0 error with code `-32001` and
36+
message `"Broker unavailable: {reason}"`, using `id: null` (permitted by JSON-RPC 2.0 §5
37+
when the request id cannot be determined).
38+
- The connect phase in `run()` is wrapped with `except Exception` to catch `TimeoutError`,
39+
`ConnectionRefusedError`, `FileNotFoundError`, and any `OSError` subclass from spawn.
40+
- `_send_broker_error` guards against `_make_stdout_writer()` failing in non-pipe contexts
41+
(e.g., test environments) with an inner try/except that logs and returns.
42+
- Two existing tests (`test_raises_timeout_when_no_socket` in both proxy and stubs test files)
43+
were updated: they previously expected `run()` to raise `TimeoutError`; the new behaviour is
44+
that `run()` returns cleanly after writing the JSON-RPC error. Both tests now verify the
45+
error payload instead.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
## REVIEW REPORT — P2-T4: Broker unavailability JSON-RPC error
2+
3+
**Scope:** origin/main..HEAD
4+
**Files:** 3 changed (proxy.py, test_broker_proxy.py, test_broker_stubs.py)
5+
**Date:** 2026-03-01
6+
7+
### Summary Verdict
8+
- [ ] Approve
9+
- [x] Approve with comments
10+
- [ ] Request changes
11+
- [ ] Block
12+
13+
---
14+
15+
### Critical Issues
16+
17+
None.
18+
19+
---
20+
21+
### Secondary Issues
22+
23+
**[Low] `except Exception` is broad in `run()`**
24+
25+
The connect-phase catch uses `except Exception`, which captures all non-`BaseException`
26+
exceptions including unexpected ones (e.g., `MemoryError` is not an issue, but e.g., a
27+
programming error in `_spawn_broker_if_needed` that raises `AttributeError` would be silently
28+
swallowed). The guard prevents the client from hanging, so the trade-off is acceptable, but a
29+
future refinement could narrow the catch to `(TimeoutError, OSError)` — the only exception
30+
types that legitimately arise from the connect path.
31+
32+
**Suggested fix (optional):** Narrow to `except (TimeoutError, OSError)` with a fallback
33+
`logger.exception` for anything else. Not a blocker; the current behaviour is safe.
34+
35+
**[Low] `id: null` in error response may confuse some clients**
36+
37+
JSON-RPC 2.0 §5 permits `null` for error responses when the request id is unknown. However,
38+
some strict clients may log a warning or discard the response entirely if they cannot match it
39+
to an outstanding request. Consider a future enhancement to read the pending request's `id`
40+
from stdin with a short timeout (e.g., 0.5s) and use it in the error response. Not a blocker
41+
for current acceptance criteria.
42+
43+
---
44+
45+
### Architectural Notes
46+
47+
- The `_send_broker_error` method is a clean, focused helper: <30 lines, single responsibility,
48+
reused for both spawn and connect failure paths.
49+
- The inner guard in `_send_broker_error` (catching `_make_stdout_writer()` failure) is the
50+
right defensive posture — in non-pipe test environments the writer setup fails, and the guard
51+
ensures the error path doesn't itself raise.
52+
- Existing tests that previously expected `run()` to raise `TimeoutError` have been correctly
53+
updated to verify the JSON-RPC error payload instead. The intent of those tests is preserved.
54+
55+
---
56+
57+
### Tests
58+
59+
- Added `TestBrokerProxyUnavailableError` (5 tests):
60+
- `test_connect_timeout_sends_jsonrpc_error` — full payload verification
61+
- `test_error_code_is_minus_32001` — code field check
62+
- `test_error_message_includes_reason` — prefix + exception text
63+
- `test_run_does_not_raise_on_connect_failure` — clean return
64+
- `test_spawn_failure_sends_jsonrpc_error` — spawn-phase error path
65+
- Updated 2 existing tests (`test_broker_proxy.py` and `test_broker_stubs.py`) that previously
66+
expected `run()` to raise; now they verify the JSON-RPC error payload.
67+
- Coverage: 91.59% (≥ 90% required). ✅
68+
- All 732 tests pass, 5 skipped. ✅
69+
70+
---
71+
72+
### Next Steps
73+
74+
- **Optional FU:** Narrow `except Exception` to `(TimeoutError, OSError)` in `run()` for
75+
tighter exception scope. Low priority — current behaviour is safe and correct.
76+
- **Optional FU:** Attempt to read the pending request's `id` from stdin with a short timeout,
77+
to use a real request id in the error response instead of `null`. Improves client
78+
compatibility. Low priority — JSON-RPC 2.0 permits `null`.
79+
- No documentation changes required (this is an internal error-handling improvement with no
80+
user-visible configuration surface).
81+
82+
**VERDICT: PASS — no blockers; two low-severity observations documented above.**

SPECS/INPROGRESS/next.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
11
# No Active Task
22

3-
**Status:** Idle — P2-T3 archived. Select the next task from `SPECS/Workplan.md`.
3+
**Status:** Idle — P2-T4 archived. Select the next task from `SPECS/Workplan.md`.
44

55
## Recently Archived
66

7+
- **P2-T4** — Surface broker unavailability as JSON-RPC error instead of silent timeout (2026-03-01, PASS)
78
- **P2-T3** — Fix double-spawn race condition when MCP client toggles rapidly (2026-03-01, PASS)
89
- **P2-T1** — Replace --broker-spawn/--broker-connect with single --broker flag (2026-03-01, PASS)
910
- **P2-T2** — Self-healing stale socket and PID file recovery (2026-03-01, PASS)
1011
- **BUG-T8** — Fix broker proxy bridge exits after first write due to BaseProtocol missing _drain_helper (2026-03-01, PASS)
11-
- **P1-T3** — Improve MCP settings examples in README to present broker setup first (2026-03-01, PASS)
1212

1313
## Suggested Next Tasks
1414

15-
- **P2-T4** (P1) — Surface broker unavailability as JSON-RPC error instead of silent timeout
1615
- **P2-T5** (P2) — Warn or restart daemon when --web-ui requested but running broker lacks it

SPECS/Workplan.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -99,17 +99,18 @@ Add new tasks using the canonical template in [TASK_TEMPLATE.md](TASK_TEMPLATE.m
9999
- [x] Lock is released on proxy exit (including crash)
100100
- [x] All existing broker tests pass
101101

102-
#### ⬜️ P2-T4: Surface broker unavailability as JSON-RPC error instead of silent timeout
102+
#### ✅ P2-T4: Surface broker unavailability as JSON-RPC error instead of silent timeout
103+
- **Status:** ✅ Completed (2026-03-01)
103104
- **Description:** When the proxy cannot connect to the broker (stale socket, spawn failed, daemon crashed mid-session), the client receives no response and eventually times out — showing "0 tools" or a generic connection error with no actionable message. Instead, the proxy should return a JSON-RPC error response (e.g. code `-32001`, message `"Broker unavailable: <reason>"`) so MCP clients can surface a meaningful error to the user rather than silently hanging.
104105
- **Priority:** P1
105106
- **Dependencies:** none
106107
- **Parallelizable:** yes
107108
- **Outputs/Artifacts:**
108-
- `src/mcpbridge_wrapper/broker/proxy.py`error response on connect failure
109+
- `src/mcpbridge_wrapper/broker/proxy.py``_send_broker_error()` helper; connect phase wrapped in try/except
109110
- **Acceptance Criteria:**
110-
- [ ] Connection timeout produces a JSON-RPC `-32001` error response to the client
111-
- [ ] Error message includes a human-readable reason (timeout, refused, stale socket)
112-
- [ ] Client does not hang indefinitely — error is returned within `connect_timeout` seconds
111+
- [x] Connection timeout produces a JSON-RPC `-32001` error response to the client
112+
- [x] Error message includes a human-readable reason (timeout, refused, stale socket)
113+
- [x] Client does not hang indefinitely — error is returned within `connect_timeout` seconds
113114

114115
#### ⬜️ P2-T5: Warn or restart daemon when --web-ui requested but running broker lacks it
115116
- **Description:** When a user configures `--broker-spawn --web-ui` and a broker daemon is already running without the web UI, the proxy connects silently and the `--web-ui` flag has no effect. The user sees 0 web UI and no explanation. Fix by detecting the mismatch: if the proxy is asked for web UI but the running daemon does not expose a web UI port (detectable via a broker status endpoint or absence of HTTP response on the expected port), emit a clear warning to stderr: `"Warning: broker is running without --web-ui. Restart the broker to enable the dashboard."`.

src/mcpbridge_wrapper/broker/proxy.py

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import asyncio
1818
import contextlib
1919
import fcntl
20+
import json
2021
import logging
2122
import os
2223
import socket
@@ -79,11 +80,21 @@ async def run(self) -> None:
7980
2. Connect to broker Unix socket (with timeout).
8081
3. Run bidirectional forward until stdin EOF or socket EOF.
8182
4. Close socket gracefully — broker process is **not** signalled.
83+
84+
If the broker is unavailable (timeout, refused, spawn failure), a
85+
JSON-RPC ``-32001`` error response is written to stdout so the MCP
86+
client receives a meaningful error instead of silently hanging.
8287
"""
83-
if self._auto_spawn:
84-
await self._spawn_broker_if_needed()
88+
try:
89+
if self._auto_spawn:
90+
await self._spawn_broker_if_needed()
8591

86-
sock_reader, sock_writer = await self._connect_with_timeout()
92+
sock_reader, sock_writer = await self._connect_with_timeout()
93+
except Exception as exc:
94+
reason = str(exc)
95+
logger.error("Broker unavailable: %s", reason)
96+
await self._send_broker_error(reason)
97+
return
8798

8899
# Set up asyncio stdin/stdout if not injected
89100
stdin_reader = self._stdin
@@ -109,6 +120,37 @@ async def run(self) -> None:
109120
# Internal helpers
110121
# ------------------------------------------------------------------
111122

123+
async def _send_broker_error(self, reason: str) -> None:
124+
"""Write a JSON-RPC -32001 error response to stdout and flush.
125+
126+
Called when the broker is unavailable (connection timeout, spawn
127+
failure, refused). Uses ``id: null`` because the incoming request
128+
id cannot be reliably read during the error path.
129+
"""
130+
payload = (
131+
json.dumps(
132+
{
133+
"jsonrpc": "2.0",
134+
"id": None,
135+
"error": {
136+
"code": -32001,
137+
"message": f"Broker unavailable: {reason}",
138+
},
139+
}
140+
)
141+
+ "\n"
142+
)
143+
writer = self._stdout
144+
if writer is None:
145+
try:
146+
writer = await self._make_stdout_writer()
147+
except Exception as exc:
148+
logger.error("Could not open stdout writer for error response: %s", exc)
149+
return
150+
writer.write(payload.encode())
151+
with contextlib.suppress(Exception):
152+
await writer.drain()
153+
112154
async def _spawn_broker_if_needed(self) -> None:
113155
"""Spawn the broker daemon if not already running.
114156

0 commit comments

Comments
 (0)