Skip to content
4 changes: 3 additions & 1 deletion SPECS/ARCHIVE/INDEX.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# mcpbridge-wrapper Tasks Archive

**Last Updated:** 2026-03-01 (P2-T3 archived)
**Last Updated:** 2026-03-01 (P2-T4 archived)

## Archived Tasks

| Task ID | Folder | Archived | Verdict |
|---------|--------|----------|---------|
| P2-T4 | [P2-T4_Surface_broker_unavailability_as_JSONRPC_error/](P2-T4_Surface_broker_unavailability_as_JSONRPC_error/) | 2026-03-01 | PASS |
| P2-T3 | [P2-T3_Fix_double_spawn_race_condition/](P2-T3_Fix_double_spawn_race_condition/) | 2026-03-01 | PASS |
| P2-T1 | [P2-T1_Replace_broker_flags_with_single_broker_flag/](P2-T1_Replace_broker_flags_with_single_broker_flag/) | 2026-03-01 | PASS |
| P2-T2 | [P2-T2_Self-healing_stale_socket_and_PID_file_recovery/](P2-T2_Self-healing_stale_socket_and_PID_file_recovery/) | 2026-03-01 | PASS |
Expand Down Expand Up @@ -174,6 +175,7 @@

| File | Description |
|------|-------------|
| [REVIEW_P2-T4_broker_unavailable_error.md](_Historical/REVIEW_P2-T4_broker_unavailable_error.md) | Review report for P2-T4 |
| [REVIEW_P2-T3_spawn_lock.md](_Historical/REVIEW_P2-T3_spawn_lock.md) | Review report for P2-T3 |
| [REVIEW_P2-T1_broker_flag.md](_Historical/REVIEW_P2-T1_broker_flag.md) | Review report for P2-T1 |
| [REVIEW_P2-T2_stale_socket_recovery.md](_Historical/REVIEW_P2-T2_stale_socket_recovery.md) | Review report for P2-T2 |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# PRD: P2-T4 — Surface broker unavailability as JSON-RPC error instead of silent timeout

## Overview

When `BrokerProxy` cannot connect to the broker (timeout, spawn failure, daemon unavailable),
the client currently receives no response and eventually times out — showing "0 tools" or a
generic connection error with no actionable message. This task fixes the proxy to return a
well-formed JSON-RPC error response so MCP clients can surface a meaningful error.

## Problem Statement

`BrokerProxy.run()` calls `_spawn_broker_if_needed()` and then `_connect_with_timeout()`.
Both may raise `TimeoutError` or `OSError`. These exceptions currently propagate uncaught,
causing the proxy process to exit. The client's stdout pipe reaches EOF, but no JSON-RPC
response is ever written — the client hangs indefinitely or shows a confusing "0 tools" state.

## Proposed Solution

Wrap the connect phase in `run()` with a try/except. On any connection failure:

1. Log the error.
2. Write a JSON-RPC 2.0 error response to stdout (before exiting).
3. Return cleanly (no re-raise).

### Error response format

```json
{
"jsonrpc": "2.0",
"id": null,
"error": {
"code": -32001,
"message": "Broker unavailable: <reason>"
}
}
```

`id` is `null` because we cannot reliably read the pending request from stdin during the error
path (the request may not have arrived yet, and reading stdin would block or require an
additional async task). JSON-RPC 2.0 §5 permits `null` for the response id when the request
id cannot be determined.

### Scope boundary

This task covers **connection-phase** failures only (before the bridge starts running). It does
NOT cover mid-session broker crashes (daemon dies while `_run_bridge` is active); that is a
separate concern.

## Deliverables

| File | Change |
|------|--------|
| `src/mcpbridge_wrapper/broker/proxy.py` | Add `_send_broker_error()` helper; wrap connect phase in `run()` with try/except |
| `tests/unit/test_broker_proxy.py` | Add `TestBrokerProxyUnavailableError` with ≥4 tests |

## Implementation Plan

### 1. `proxy.py` — add `_send_broker_error()`

New private async method:

```python
async def _send_broker_error(self, reason: str) -> None:
"""Write a JSON-RPC -32001 error to stdout and flush."""
import json
payload = json.dumps({
"jsonrpc": "2.0",
"id": None,
"error": {"code": -32001, "message": f"Broker unavailable: {reason}"},
}) + "\n"
writer = self._stdout
if writer is None:
writer = await self._make_stdout_writer()
writer.write(payload.encode())
try:
await writer.drain()
except Exception:
pass
```

### 2. `proxy.py` — modify `run()`

Wrap the connect phase:

```python
async def run(self) -> None:
try:
if self._auto_spawn:
await self._spawn_broker_if_needed()
sock_reader, sock_writer = await self._connect_with_timeout()
except Exception as exc:
reason = str(exc)
logger.error("Broker unavailable: %s", reason)
await self._send_broker_error(reason)
return
# ... rest unchanged ...
```

### 3. `test_broker_proxy.py` — add `TestBrokerProxyUnavailableError`

Tests:
- `test_connect_timeout_sends_jsonrpc_error` — TimeoutError from `_connect_with_timeout` → error written to stdout writer
- `test_error_code_is_minus_32001` — error code in payload is -32001
- `test_error_message_includes_reason` — `"Broker unavailable:"` prefix present in message
- `test_run_does_not_raise_on_connect_failure` — `run()` returns without re-raising on TimeoutError
- `test_spawn_failure_sends_jsonrpc_error` — TimeoutError from `_spawn_broker_if_needed` → error written

## Acceptance Criteria

- [ ] Connection timeout produces a JSON-RPC `-32001` error response written to stdout
- [ ] Error message includes a human-readable reason (timeout, refused, stale socket)
- [ ] `run()` returns without re-raising — client does not hang indefinitely
- [ ] All existing broker tests pass
- [ ] `pytest --cov` coverage ≥ 90%
- [ ] `ruff check src/` passes
- [ ] `ruff format --check src/ tests/` passes

## Dependencies

- None (P2-T2 already handles stale socket recovery in spawn; this task is a pure error-surface improvement)

## Risk

Low. The change is additive — existing happy path is unchanged. The error path only activates
when connection already fails.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Validation Report: P2-T4 — Surface broker unavailability as JSON-RPC error

**Date:** 2026-03-01
**Status:** PASS

## Changes Delivered

| File | Change |
|------|--------|
| `src/mcpbridge_wrapper/broker/proxy.py` | Added `import json`; added `_send_broker_error()` helper; wrapped connect phase in `run()` with try/except |
| `tests/unit/test_broker_proxy.py` | Updated `test_returns_with_error_when_no_socket`; added `TestBrokerProxyUnavailableError` (5 tests) |
| `tests/unit/test_broker_stubs.py` | Updated `test_run_raises_timeout_when_no_socket` → `test_run_writes_error_when_no_socket` to match new behaviour |

## Acceptance Criteria

- [x] Connection timeout produces a JSON-RPC `-32001` error response written to stdout
- [x] Error message includes a human-readable reason (`"Broker unavailable: <reason>"`)
- [x] `run()` returns without re-raising — client does not hang indefinitely
- [x] All existing broker tests pass (updated 2 tests whose behaviour legitimately changed)
- [x] `pytest --cov` coverage ≥ 90% (achieved 91.59%)
- [x] `ruff check src/` passes
- [x] `ruff format --check src/ tests/` passes

## Quality Gates

| Gate | Result |
|------|--------|
| `pytest -q` | 732 passed, 5 skipped |
| `pytest --cov` coverage | 91.59% (≥ 90% required) |
| `ruff check src/` | PASS |
| `ruff format --check src/ tests/` | PASS |

## Implementation Notes

- `_send_broker_error(reason)` constructs a JSON-RPC 2.0 error with code `-32001` and
message `"Broker unavailable: {reason}"`, using `id: null` (permitted by JSON-RPC 2.0 §5
when the request id cannot be determined).
- The connect phase in `run()` is wrapped with `except Exception` to catch `TimeoutError`,
`ConnectionRefusedError`, `FileNotFoundError`, and any `OSError` subclass from spawn.
- `_send_broker_error` guards against `_make_stdout_writer()` failing in non-pipe contexts
(e.g., test environments) with an inner try/except that logs and returns.
- Two existing tests (`test_raises_timeout_when_no_socket` in both proxy and stubs test files)
were updated: they previously expected `run()` to raise `TimeoutError`; the new behaviour is
that `run()` returns cleanly after writing the JSON-RPC error. Both tests now verify the
error payload instead.
82 changes: 82 additions & 0 deletions SPECS/ARCHIVE/_Historical/REVIEW_P2-T4_broker_unavailable_error.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
## REVIEW REPORT — P2-T4: Broker unavailability JSON-RPC error

**Scope:** origin/main..HEAD
**Files:** 3 changed (proxy.py, test_broker_proxy.py, test_broker_stubs.py)
**Date:** 2026-03-01

### Summary Verdict
- [ ] Approve
- [x] Approve with comments
- [ ] Request changes
- [ ] Block

---

### Critical Issues

None.

---

### Secondary Issues

**[Low] `except Exception` is broad in `run()`**

The connect-phase catch uses `except Exception`, which captures all non-`BaseException`
exceptions including unexpected ones (e.g., `MemoryError` is not an issue, but e.g., a
programming error in `_spawn_broker_if_needed` that raises `AttributeError` would be silently
swallowed). The guard prevents the client from hanging, so the trade-off is acceptable, but a
future refinement could narrow the catch to `(TimeoutError, OSError)` — the only exception
types that legitimately arise from the connect path.

**Suggested fix (optional):** Narrow to `except (TimeoutError, OSError)` with a fallback
`logger.exception` for anything else. Not a blocker; the current behaviour is safe.

**[Low] `id: null` in error response may confuse some clients**

JSON-RPC 2.0 §5 permits `null` for error responses when the request id is unknown. However,
some strict clients may log a warning or discard the response entirely if they cannot match it
to an outstanding request. Consider a future enhancement to read the pending request's `id`
from stdin with a short timeout (e.g., 0.5s) and use it in the error response. Not a blocker
for current acceptance criteria.

---

### Architectural Notes

- The `_send_broker_error` method is a clean, focused helper: <30 lines, single responsibility,
reused for both spawn and connect failure paths.
- The inner guard in `_send_broker_error` (catching `_make_stdout_writer()` failure) is the
right defensive posture — in non-pipe test environments the writer setup fails, and the guard
ensures the error path doesn't itself raise.
- Existing tests that previously expected `run()` to raise `TimeoutError` have been correctly
updated to verify the JSON-RPC error payload instead. The intent of those tests is preserved.

---

### Tests

- Added `TestBrokerProxyUnavailableError` (5 tests):
- `test_connect_timeout_sends_jsonrpc_error` — full payload verification
- `test_error_code_is_minus_32001` — code field check
- `test_error_message_includes_reason` — prefix + exception text
- `test_run_does_not_raise_on_connect_failure` — clean return
- `test_spawn_failure_sends_jsonrpc_error` — spawn-phase error path
- Updated 2 existing tests (`test_broker_proxy.py` and `test_broker_stubs.py`) that previously
expected `run()` to raise; now they verify the JSON-RPC error payload.
- Coverage: 91.59% (≥ 90% required). ✅
- All 732 tests pass, 5 skipped. ✅

---

### Next Steps

- **Optional FU:** Narrow `except Exception` to `(TimeoutError, OSError)` in `run()` for
tighter exception scope. Low priority — current behaviour is safe and correct.
- **Optional FU:** Attempt to read the pending request's `id` from stdin with a short timeout,
to use a real request id in the error response instead of `null`. Improves client
compatibility. Low priority — JSON-RPC 2.0 permits `null`.
- No documentation changes required (this is an internal error-handling improvement with no
user-visible configuration surface).

**VERDICT: PASS — no blockers; two low-severity observations documented above.**
5 changes: 2 additions & 3 deletions SPECS/INPROGRESS/next.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
# No Active Task

**Status:** Idle — P2-T3 archived. Select the next task from `SPECS/Workplan.md`.
**Status:** Idle — P2-T4 archived. Select the next task from `SPECS/Workplan.md`.

## Recently Archived

- **P2-T4** — Surface broker unavailability as JSON-RPC error instead of silent timeout (2026-03-01, PASS)
- **P2-T3** — Fix double-spawn race condition when MCP client toggles rapidly (2026-03-01, PASS)
- **P2-T1** — Replace --broker-spawn/--broker-connect with single --broker flag (2026-03-01, PASS)
- **P2-T2** — Self-healing stale socket and PID file recovery (2026-03-01, PASS)
- **BUG-T8** — Fix broker proxy bridge exits after first write due to BaseProtocol missing _drain_helper (2026-03-01, PASS)
- **P1-T3** — Improve MCP settings examples in README to present broker setup first (2026-03-01, PASS)

## Suggested Next Tasks

- **P2-T4** (P1) — Surface broker unavailability as JSON-RPC error instead of silent timeout
- **P2-T5** (P2) — Warn or restart daemon when --web-ui requested but running broker lacks it
11 changes: 6 additions & 5 deletions SPECS/Workplan.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,17 +99,18 @@ Add new tasks using the canonical template in [TASK_TEMPLATE.md](TASK_TEMPLATE.m
- [x] Lock is released on proxy exit (including crash)
- [x] All existing broker tests pass

#### ⬜️ P2-T4: Surface broker unavailability as JSON-RPC error instead of silent timeout
#### ✅ P2-T4: Surface broker unavailability as JSON-RPC error instead of silent timeout
- **Status:** ✅ Completed (2026-03-01)
- **Description:** When the proxy cannot connect to the broker (stale socket, spawn failed, daemon crashed mid-session), the client receives no response and eventually times out — showing "0 tools" or a generic connection error with no actionable message. Instead, the proxy should return a JSON-RPC error response (e.g. code `-32001`, message `"Broker unavailable: <reason>"`) so MCP clients can surface a meaningful error to the user rather than silently hanging.
- **Priority:** P1
- **Dependencies:** none
- **Parallelizable:** yes
- **Outputs/Artifacts:**
- `src/mcpbridge_wrapper/broker/proxy.py` — error response on connect failure
- `src/mcpbridge_wrapper/broker/proxy.py` — `_send_broker_error()` helper; connect phase wrapped in try/except
- **Acceptance Criteria:**
- [ ] Connection timeout produces a JSON-RPC `-32001` error response to the client
- [ ] Error message includes a human-readable reason (timeout, refused, stale socket)
- [ ] Client does not hang indefinitely — error is returned within `connect_timeout` seconds
- [x] Connection timeout produces a JSON-RPC `-32001` error response to the client
- [x] Error message includes a human-readable reason (timeout, refused, stale socket)
- [x] Client does not hang indefinitely — error is returned within `connect_timeout` seconds

#### ⬜️ P2-T5: Warn or restart daemon when --web-ui requested but running broker lacks it
- **Description:** When a user configures `--broker-spawn --web-ui` and a broker daemon is already running without the web UI, the proxy connects silently and the `--web-ui` flag has no effect. The user sees 0 web UI and no explanation. Fix by detecting the mismatch: if the proxy is asked for web UI but the running daemon does not expose a web UI port (detectable via a broker status endpoint or absence of HTTP response on the expected port), emit a clear warning to stderr: `"Warning: broker is running without --web-ui. Restart the broker to enable the dashboard."`.
Expand Down
48 changes: 45 additions & 3 deletions src/mcpbridge_wrapper/broker/proxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import asyncio
import contextlib
import fcntl
import json
import logging
import os
import socket
Expand Down Expand Up @@ -79,11 +80,21 @@ async def run(self) -> None:
2. Connect to broker Unix socket (with timeout).
3. Run bidirectional forward until stdin EOF or socket EOF.
4. Close socket gracefully — broker process is **not** signalled.

If the broker is unavailable (timeout, refused, spawn failure), a
JSON-RPC ``-32001`` error response is written to stdout so the MCP
client receives a meaningful error instead of silently hanging.
"""
if self._auto_spawn:
await self._spawn_broker_if_needed()
try:
if self._auto_spawn:
await self._spawn_broker_if_needed()

sock_reader, sock_writer = await self._connect_with_timeout()
sock_reader, sock_writer = await self._connect_with_timeout()
except Exception as exc:
reason = str(exc)
logger.error("Broker unavailable: %s", reason)
await self._send_broker_error(reason)
return

# Set up asyncio stdin/stdout if not injected
stdin_reader = self._stdin
Expand All @@ -109,6 +120,37 @@ async def run(self) -> None:
# Internal helpers
# ------------------------------------------------------------------

async def _send_broker_error(self, reason: str) -> None:
"""Write a JSON-RPC -32001 error response to stdout and flush.

Called when the broker is unavailable (connection timeout, spawn
failure, refused). Uses ``id: null`` because the incoming request
id cannot be reliably read during the error path.
"""
payload = (
json.dumps(
{
"jsonrpc": "2.0",
"id": None,
"error": {
"code": -32001,
"message": f"Broker unavailable: {reason}",
},
}
)
+ "\n"
)
writer = self._stdout
if writer is None:
try:
writer = await self._make_stdout_writer()
except Exception as exc:
logger.error("Could not open stdout writer for error response: %s", exc)
return
writer.write(payload.encode())
with contextlib.suppress(Exception):
await writer.drain()

async def _spawn_broker_if_needed(self) -> None:
"""Spawn the broker daemon if not already running.

Expand Down
Loading