diff --git a/.gitignore b/.gitignore index ca2b2686..337f8425 100644 --- a/.gitignore +++ b/.gitignore @@ -59,6 +59,12 @@ mcp_config.json tmp/ temp/ +# Ephemeral agent execution reports — must never land in the repo. +# Agents write these as per-run logs; content worth keeping belongs in a +# spec (specs/NNN/) or a design doc (docs/superpowers/specs/). +tmp-agent-report-*.md +/tmp-agent-report-*/ + # Build artifacts build/ dist/ diff --git a/docs/screenshots/tray-macos/01-dashboard.png b/docs/screenshots/tray-macos/01-dashboard.png new file mode 100644 index 00000000..71c8da90 Binary files /dev/null and b/docs/screenshots/tray-macos/01-dashboard.png differ diff --git a/docs/screenshots/tray-macos/02-server-detail-diff.png b/docs/screenshots/tray-macos/02-server-detail-diff.png new file mode 100644 index 00000000..56952049 Binary files /dev/null and b/docs/screenshots/tray-macos/02-server-detail-diff.png differ diff --git a/docs/screenshots/tray-macos/03-server-detail-tools.png b/docs/screenshots/tray-macos/03-server-detail-tools.png new file mode 100644 index 00000000..5818bb3a Binary files /dev/null and b/docs/screenshots/tray-macos/03-server-detail-tools.png differ diff --git a/docs/screenshots/tray-macos/04-activity-log-with-detail.png b/docs/screenshots/tray-macos/04-activity-log-with-detail.png new file mode 100644 index 00000000..8f01dced Binary files /dev/null and b/docs/screenshots/tray-macos/04-activity-log-with-detail.png differ diff --git a/docs/screenshots/web-ui/01-dashboard.jpg b/docs/screenshots/web-ui/01-dashboard.jpg new file mode 100644 index 00000000..b066fe76 Binary files /dev/null and b/docs/screenshots/web-ui/01-dashboard.jpg differ diff --git a/docs/superpowers/specs/2026-04-24-diagnostics-error-taxonomy-design.md b/docs/superpowers/specs/2026-04-24-diagnostics-error-taxonomy-design.md new file mode 100644 index 00000000..5c6fddb8 --- /dev/null +++ b/docs/superpowers/specs/2026-04-24-diagnostics-error-taxonomy-design.md @@ -0,0 +1,156 @@ +# Diagnostics & error taxonomy deep-dive + +**Date**: 2026-04-24 +**Status**: Approved for speckit flow (user confirmed scope + autonomous execution) +**Repo**: `mcpproxy-go` (Go core + Vue frontend + macOS tray) + +## 1. Problem + +Of 306 real-user installs with ≥ 1 configured upstream server, only 238 (78 %) have any connected server — so **~22 % of servers never connect at all**, and among those with servers 30 % show `server_count > connected_server_count` (partial failure). Users hit errors that are cryptic (`oauth_refresh_failed`, `docker_status fail`, `deprecated_configs fail`), and the CLI `doctor` output is pass/fail per category with no user-facing fix guidance. The existing `doctor_checks` and `error_category_counts` fields in v2 telemetry already tell us WHICH checks fail, but there's no actionable user path and no stable error-code catalog. + +The hypothesis: **every "didn't connect" failure is diagnosable and most are fixable in one click or one command.** We need (a) a stable error-code catalog, (b) user-facing explanations and fix steps per code, (c) surfacing in tray + web UI + CLI, (d) telemetry on which codes occur and which fixes succeed. + +## 2. Goals + +1. **Stable error-code catalog** — every recoverable failure path gets a code like `MCPX_OAUTH_REFRESH_403`, `MCPX_DOCKER_DAEMON_DOWN`, `MCPX_STDIO_CRASH_ENOENT`. Codes are stable across releases; descriptions can evolve. +2. **Per-code "how to fix"** — each code has: (a) one-sentence human explanation, (b) concrete next steps (click / command / link), (c) deep-link to `docs/errors/.md` that expands the explanation. +3. **Surfacing**: + - **Tray** — badge showing "N servers failing"; menu group "Fix issues" with per-server fix button that routes to the right action. + - **Web UI** — per-server error panel with the code + explanation + fix button(s). + - **CLI** — `mcpproxy doctor --server ` shows code + fix steps; `mcpproxy doctor fix --code ` runs the automated fix when available (with `--dry-run`). +4. **Telemetry** — v3 payload adds `diagnostics.error_code_counts_24h` (bucketed), `diagnostics.fix_attempted_24h` (count of fix-button clicks), `diagnostics.fix_succeeded_24h`. Code names are stable strings — safe to aggregate. +5. **Measurable improvement** — after rollout, the % of installs with `server_count == connected_server_count` rises over 30 days. + +## 3. Non-Goals + +- Fully automated auto-remediation. User approves every fix. +- Replacing existing logging. The taxonomy layers *on top of* existing zap logs. +- Fixing every failure mode. Scope is limited to the recurring categories we already see in telemetry + the ones we find during the error-inventory phase. +- Changing MCP protocol behavior. + +## 4. Error taxonomy (initial catalog — to be expanded during implementation) + +Codes follow `MCPX__` with stable identifiers. Each has: `code`, `severity` (info/warn/error), `user_message`, `fix_steps` (ordered list with optional `action` type: `link` / `command` / `button`), `docs_url`. + +### 4.1 Initial domains (drawn from existing `error_category_counts` + common GitHub issues) + +| Domain | Example codes | Typical fix | +|---|---|---| +| `OAUTH` | `MCPX_OAUTH_REFRESH_EXPIRED`, `MCPX_OAUTH_REFRESH_403`, `MCPX_OAUTH_DISCOVERY_FAILED`, `MCPX_OAUTH_CALLBACK_TIMEOUT` | Re-login via tray / web UI button that triggers the OAuth flow | +| `STDIO` | `MCPX_STDIO_SPAWN_ENOENT`, `MCPX_STDIO_EXIT_NONZERO`, `MCPX_STDIO_HANDSHAKE_TIMEOUT` | Install missing tool (npx/uvx guidance), check working_dir, show last N log lines | +| `HTTP` | `MCPX_HTTP_DNS_FAILED`, `MCPX_HTTP_TLS_FAILED`, `MCPX_HTTP_401`, `MCPX_HTTP_404`, `MCPX_HTTP_5XX` | Check URL, provide auth header, TLS debug; one-click edit server config | +| `DOCKER` | `MCPX_DOCKER_DAEMON_DOWN`, `MCPX_DOCKER_IMAGE_PULL_FAILED`, `MCPX_DOCKER_NO_PERMISSION`, `MCPX_DOCKER_SNAP_APPARMOR` | Show install docs for Docker Desktop / colima; snap-docker specific guidance | +| `CONFIG` | `MCPX_CONFIG_DEPRECATED_FIELD`, `MCPX_CONFIG_PARSE_ERROR`, `MCPX_CONFIG_MISSING_SECRET` | Auto-migrate button for deprecated fields; show diff preview | +| `QUARANTINE` | `MCPX_QUARANTINE_PENDING_APPROVAL`, `MCPX_QUARANTINE_TOOL_CHANGED` | Link to quarantine panel with approve button | +| `NETWORK` | `MCPX_NETWORK_PROXY_MISCONFIG`, `MCPX_NETWORK_OFFLINE` | Show detected system proxy; ping test button | + +Exact code list for each domain is produced during the **error-inventory task** (first implementation phase): grep every `zap.Error` call path and every terminal error state in `internal/upstream/*`, `internal/oauth/*`, `internal/server/*`, then map each to a code + message + fix. Spec does not pre-enumerate every code; it mandates the *structure*. + +## 5. Implementation structure + +### 5.1 New package `internal/diagnostics` + +``` +internal/diagnostics/ +├── catalog.go // Code, Severity, Message, FixStep, DocsURL types; registry +├── codes.go // All MCPX_* constants — generated / hand-written +├── classifier.go // Takes a raw error (from upstream/oauth/docker) → returns a Code +├── registry.go // In-memory registry; loaded at startup; tests enforce completeness +├── fixers.go // Optional automated-fix handlers per code (dry-run first) +└── codes_test.go // Every code must have a message + at least one fix_step +``` + +### 5.2 Integration points + +- `internal/upstream/manager.go` wraps every connection failure into a `DiagnosticError{Code, Cause, ServerID}`. +- `internal/oauth/*` similarly classifies OAuth-specific failures. +- `internal/runtime/stateview/stateview.go` includes latest `DiagnosticError` per server in the snapshot. +- `/api/v1/servers/{name}/diagnostics` extends today's response with a structured `error_code`, `user_message`, `fix_steps` array (existing consumers keep working; new fields additive). +- `/api/v1/diagnostics/fix` — new endpoint that runs a named fix by code for a server (dry-run by default). Idempotent and rate-limited. +- `internal/runtime/activity_service.go` records each fix attempt + outcome so the activity log has audit trail. + +### 5.3 Frontend (`frontend/src/`) + +- New `components/diagnostics/ErrorPanel.vue` — reusable component rendering `{code, message, fix_steps}`. Steps render as buttons (trigger API) or links (open docs/URL) or inline code (copyable). +- Used from `ServerDetail.vue` (per-server) and from a new `DiagnosticsPage.vue` (aggregate). +- `stores/servers.ts` subscribes to the existing SSE stream; when a server's `error_code` changes, ErrorPanel updates. + +### 5.4 macOS tray (Swift) + +- Status badge: red dot if any server has `severity=error`; orange if only `warn`. +- Menu section "⚠ Fix issues (N)" — collapses to per-server items. Clicking opens web UI to the right server's diagnostics panel (single-source-of-truth for fix buttons, avoids duplicating fix logic in Swift). +- On macOS the tray already reads `/api/v1/servers`; extend to read the new `error_code` field. + +### 5.5 CLI (`cmd/mcpproxy`) + +- `mcpproxy doctor` — unchanged default output; add `--server ` filter and `--codes` to print codes instead of categories. +- `mcpproxy doctor fix --server ` — runs the fixer. `--dry-run` by default if the fix is potentially destructive. +- `mcpproxy doctor list-codes` — prints the full catalog (useful for documentation generation and for AI agents). + +### 5.6 Docs + +Each code gets a page at `docs/errors/.md` with: explanation, cause, fix steps, related links. Index at `docs/errors/README.md`. Auto-generated stub from the catalog; hand-written body. Linked from the web UI + tray. + +## 6. Telemetry extensions (ties into spec 1) + +v3 payload gets a new top-level `diagnostics` object (ships with spec 1's schema bump to avoid a second migration): + +```json +{ + "diagnostics": { + "error_code_counts_24h": { + "MCPX_OAUTH_REFRESH_EXPIRED": 3, + "MCPX_DOCKER_DAEMON_DOWN": 1 + }, + "fix_attempted_24h": 2, + "fix_succeeded_24h": 1, + "unique_codes_ever": 7 + } +} +``` + +- `error_code_counts_24h` — capped to top 20 codes per heartbeat to bound payload. +- `fix_attempted_24h` / `fix_succeeded_24h` — how often users click fix buttons. +- `unique_codes_ever` — sanity check for catalog coverage. + +## 7. Ground rules + +- **Backwards compatible** — existing `doctor_checks` v2 field stays; new fields are additive. v2 dashboards keep working. +- **No auto-remediation without user click.** Fixers never run at startup or on a schedule. Every fix is a response to a button press or CLI invocation. +- **Error codes are stable.** Once shipped, a code's `name` never changes; deprecation = mark it hidden + point to new code. +- **Docs auto-linked.** Every code surfaces a docs URL, and every docs page is real (CI check: no 404). + +## 8. Verification plan (required per PR) + +1. **Unit tests** — `internal/diagnostics/*_test.go`: every code has a registered message + ≥1 fix step; classifier correctly maps 20+ golden error samples to the right code; fixer dry-runs don't mutate state. +2. **E2E** — extend `./scripts/test-api-e2e.sh`: start mcpproxy, configure a deliberately-broken stdio server (`command: /nonexistent`), hit `/api/v1/servers/.../diagnostics`, assert `error_code == "MCPX_STDIO_SPAWN_ENOENT"`, call `/api/v1/diagnostics/fix` with dry-run and assert expected preview. +3. **curl** — each new endpoint covered by a curl example in `docs/api/rest-api.md` and a smoke test in CI. +4. **Chrome browser** — open the web UI via `claude-in-chrome`, navigate to a broken server's detail page, confirm ErrorPanel renders with code + fix button; click fix (dry-run path), confirm toast. +5. **UI test MCP** — screenshot macOS tray with a broken server, confirm red badge + "Fix issues (N)" menu group; click menu item, assert it opens the web UI to the right URL. +6. **Docs link check** — CI job runs a link checker over `docs/errors/*.md` ensuring each docs page exists for every code registered in the catalog. + +## 9. Sequencing + +1. **Error inventory** — grep existing codebase, enumerate error sites, produce initial code list (PR: catalog-only, no surfacing). +2. **Diagnostics package + classifier + registry tests** — PR adds `internal/diagnostics` + wires one domain (`STDIO` — highest-volume failures). +3. **Expand to remaining domains** — one PR per domain (OAUTH, HTTP, DOCKER, CONFIG, QUARANTINE, NETWORK), parallelizable. +4. **REST API + telemetry** — add `/diagnostics` structured fields + v3 telemetry (merges with spec 1's schema bump). +5. **Web UI ErrorPanel** — add Vue component + wire from ServerDetail. +6. **macOS tray badges** — add menu group + badge. +7. **CLI `fix` subcommand** — with `--dry-run` default. +8. **Docs pages** — auto-generate stubs, hand-fill bodies. + +## 10. Success criteria + +1. 100 % of terminal connection errors map to a non-empty `error_code` (no more "unknown failure"). +2. Every code has: message, ≥1 fix step, docs URL (link-checked in CI). +3. Over 30 days post-launch, `connected_server_count / server_count` ratio among real-user installs rises. +4. `fix_succeeded_24h / fix_attempted_24h` ≥ 0.5 (half of user-initiated fixes work). +5. `/api/v1/servers/{name}/diagnostics` 200 latency p95 < 50 ms. + +## 11. Open questions + +- **Automated fix for `MCPX_DOCKER_SNAP_APPARMOR`** — we know (from prior memory) that snap-docker + AppArmor is fundamentally incompatible with our scanner. Should the fix be "disable scanner for this server" or "suggest switching to non-snap Docker"? Decide during domain-3 PR. +- **OAuth re-auth fix UX** — tray button → system browser flow. Needs testing with ≥ 2 providers to ensure the existing flow-coordinator handles concurrent re-auth gracefully. + +Both are acceptable to defer to implementation time — they don't block the catalog structure. diff --git a/docs/superpowers/specs/2026-04-24-retention-telemetry-hygiene-design.md b/docs/superpowers/specs/2026-04-24-retention-telemetry-hygiene-design.md new file mode 100644 index 00000000..1a1ccf78 --- /dev/null +++ b/docs/superpowers/specs/2026-04-24-retention-telemetry-hygiene-design.md @@ -0,0 +1,181 @@ +# Retention telemetry hygiene + activation instrumentation + auto-start defaults + +**Date**: 2026-04-24 +**Status**: Approved for speckit flow (user confirmed scope + autonomous execution) +**Repos touched**: `mcpproxy-go`, `mcpproxy-telemetry`, `mcpproxy-dash` + +## 1. Problem + +After excluding CI installs via the dashboard's existing version-rule + GitHub-IP filter, real-user day-1 retention is **38 %** across 337 installs, with sharp OS segmentation: **macOS 54 %** (76/142), **Windows 42 %** (5/12), **Linux 26 %** (48/183). Three gaps block further work: + +1. **CI attribution is heuristic.** The worker classifies CI post-hoc using GitHub Actions CIDRs + "version doesn't start with `v`". Both miss real CI (non-GitHub runners, properly-versioned container images) and occasionally flag real users. There's no ground truth in the payload. +2. **Activation is invisible.** We see `server_count`, `connected_server_count`, and `surface_requests.tray`, but we cannot see: (a) whether an IDE (Claude Code / Cursor / VS Code / Windsurf / Codex CLI / Gemini CLI) ever actually called `/mcp`, (b) whether the user ran `retrieve_tools`, (c) whether auto-start-at-login is configured, (d) how the core was launched (tray / login-item / CLI / installer). +3. **Tray adoption is soft-gated.** macOS retention at 54 % is healthy but we can lift it: ~39 % of macOS v2 installs never recorded a tray request (core running without tray = user quit tray or launched from CLI). Auto-start-at-login is opt-in today. + +## 2. Goals + +1. **Ground-truth CI classification** — mcpproxy itself decides it's running in CI / cloud IDE / container / headless / interactive, and publishes that verdict in the heartbeat. Dashboard filters on the real field; version-rule heuristic becomes fallback for pre-v3 rows only. +2. **Activation funnel visible** — from the dashboard, answer "of today's real-human first-runs, what % connected a server, what % connected at least one IDE, what % called `retrieve_tools`?" for the first time. +3. **Auto-start default ON** on macOS and Windows tray, with telemetry confirming it. Installer opens the tray automatically on its final step. +4. **Safety** — D1 backed up before any migration, zero PII added, existing stored IP/city re-audited. + +## 3. Non-Goals + +- Changing the heartbeat cadence (still 24 h + startup-kick). +- Changing `anonymous_id` generation, storage, or lifecycle. +- Adding any identifier that correlates with a human (email, machine name, username, file paths, config contents). +- UI redesign of existing "Connect MCPProxy to AI Agents" dialog — only telemetry hooks. +- Changes to server edition telemetry (we have 0 `edition=server` installs today; defer). +- Removing the dashboard's version-rule CI filter; it stays as fallback for rows where `env_kind` is absent. + +## 4. Payload schema v3 + +Bump `schema_version` from 2 → 3 and extend `payload_json` with five new fields. All other v2 fields unchanged. + +### 4.1 New fields + +| Field | Type | Values | How computed (client-side) | +|---|---|---|---| +| `env_kind` | enum string | `interactive` \| `ci` \| `cloud_ide` \| `container` \| `headless` \| `unknown` | See §4.2 decision tree | +| `launch_source` | enum string | `tray` \| `login_item` \| `cli` \| `installer` \| `unknown` | See §4.3 | +| `autostart_enabled` | bool \| null | true / false / null if unknown on platform | macOS: read launchd plist for login item; Windows: read registry `Run` key; Linux: always null | +| `activation` | object | see §4.4 | Ever-true flags + last-24h counters | +| `env_markers` | object | see §4.5 | **Booleans only** — never the env-var value | + +### 4.2 `env_kind` decision tree (client, in order) + +1. Any of `CI=true`, `GITHUB_ACTIONS`, `GITLAB_CI`, `JENKINS_URL`, `CIRCLECI`, `BUILDKITE`, `TF_BUILD`, `TRAVIS`, `DRONE`, `BITBUCKET_BUILD_NUMBER`, `TEAMCITY_VERSION`, `APPVEYOR`, `GITEA_ACTIONS` set → `ci`. +2. Any of `CODESPACES`, `GITPOD_WORKSPACE_ID`, `REPL_ID`, `STACKBLITZ_ENV`, `DAYTONA_WS_ID`, `CODER_AGENT_TOKEN` set → `cloud_ide`. +3. `/.dockerenv` or `/run/.containerenv` exists, OR `$container` = `podman|docker|oci`, AND none of the above → `container`. +4. OS = `darwin` or `windows` → `interactive` (tray/installer platforms). +5. OS = `linux` AND (`$DISPLAY` set OR `$WAYLAND_DISPLAY` set OR stdin is a TTY) AND none of 1-3 → `interactive`. +6. OS = `linux`, none of the above → `headless`. +7. Otherwise → `unknown`. + +Detection runs **once at startup** and is cached for the process lifetime. + +### 4.3 `launch_source` + +- `installer` → set by installer passing `MCPPROXY_LAUNCHED_BY=installer` on first post-install launch (cleared after one heartbeat). +- `login_item` → set when the OS launched the binary as a login item (Mac: LSBackgroundOnly + parent is `launchd`; Windows: process tree rooted at `explorer.exe` Run key launcher). +- `tray` → set when core was started via tray socket handshake (core already receives this; surface it). +- `cli` → default when interactive TTY + no parent-of-launchd. +- `unknown` → anything else. + +### 4.4 `activation` object + +```json +{ + "first_connected_server_ever": true, + "first_mcp_client_ever": true, + "first_retrieve_tools_call_ever": false, + "mcp_clients_seen_ever": ["claude-code", "cursor", "unknown"], + "retrieve_tools_calls_24h": 12, + "estimated_tokens_saved_24h_bucket": "100_1k", + "configured_ide_count": 2 +} +``` + +- `first_*_ever` — monotonic booleans persisted in BBolt under a new `activation` bucket. Once true, stays true. +- `mcp_clients_seen_ever` — set of User-Agent / client-name fingerprints observed on `/mcp` (client identifies itself via `initialize` params.clientInfo.name per MCP spec). Deduped. Capped at 16 entries to bound payload size. Unknown clients logged as `"unknown"`. +- `retrieve_tools_calls_24h` — sliding-window counter (increment on each `retrieve_tools` builtin call, decay on 24h heartbeat). +- `estimated_tokens_saved_24h_bucket` — bucketed to prevent cardinality: `"0"`, `"1_100"`, `"100_1k"`, `"1k_10k"`, `"10k_100k"`, `"100k_plus"`. Estimate = Σ (tools_not_exposed_to_client * avg_tokens_per_tool_schema). Computed at heartbeat time. +- `configured_ide_count` — count of IDE config files touched by the "Connect MCPProxy to AI Agents" UI, read from the existing config-write tracker. + +### 4.5 `env_markers` (booleans only — no values) + +```json +{ + "has_ci_env": false, + "has_cloud_ide_env": false, + "is_container": false, + "has_tty": true, + "has_display": true +} +``` + +Used for dashboard deep-drill and for sanity-checking `env_kind`. **Never** store the env var value itself. + +### 4.6 Removed from payload + +`ip_address`, `city`, `country`, `region` **are not transmitted** — they're derived by the worker from Cloudflare request headers. That's already true today; this spec makes it explicit. + +## 5. Worker changes (`mcpproxy-telemetry`) + +1. **D1 backup before migration** — required first step: + ```bash + npx wrangler d1 export mcpproxy-telemetry --remote \ + --output=backups/mcpproxy-telemetry-$(date +%Y%m%d-%H%M%S).sql + ``` + Committed to a private-repo backup branch, not the public one. +2. **Schema migration** — add `env_kind TEXT` + `launch_source TEXT` + `autostart_enabled INTEGER` columns, indexed on `env_kind`. Write in `migrations/` with up/down SQL. New v3 rows populate them from the parsed JSON; v2 rows leave them NULL. +3. **Validation** — reject payloads where `env_kind` is not in the allowed enum, and where `env_markers` contains any non-boolean value (defense in depth against client bugs leaking values). +4. **Backfill job** — one-off `scripts/backfill-envkind.ts` that reads existing 2,615 rows and computes a heuristic `env_kind` with a `_inferred` suffix (e.g. `ci_inferred`, `interactive_inferred`). Uses: version-rule, GH Actions IP rule, country × OS × uptime patterns. Stored in same column so dashboard code doesn't care. Document in spec that `_inferred` values are heuristic. +5. **PII audit** — review `ip_address`, `city`, `region` storage. Current retention is indefinite. Propose: truncate IP to /24 (IPv4) or /48 (IPv6) on ingest, drop exact IP after 30 days. Applies to all schema versions. + +## 6. Dashboard changes (`mcpproxy-dash`) + +1. Prefer `env_kind` over version-rule when present (non-NULL, non-`unknown`). Fall back to existing version-rule + GH IP classifier for NULL rows. +2. New pages / sections: + - **Activation funnel** on `/` overview: first-run → server configured → server connected → IDE connected → `retrieve_tools` called. Counts + conversion %. + - **Launch source mix** on `/` overview: stacked bar of `launch_source` for last 30 days. + - **CI transparency** panel on `/ci` (exists): show both classifications (ground-truth + heuristic), a confusion matrix, and residual "unknown" count. +3. `ci=exclude` default becomes `env_kind NOT IN ('ci', 'ci_inferred', 'cloud_ide', 'cloud_ide_inferred')` for v3+ rows. Dashboards must make this transparent (small badge: "real humans only"). + +## 7. mcpproxy-go changes + +1. **`internal/telemetry`** + - New file `env_kind.go` — detection logic per §4.2, cached at startup. + - New file `activation.go` — BBolt-backed monotonic flags + rolling counters + token estimator. + - Extend `payload_v2.go` → `payload_v3.go` (copy + add fields; keep v2 builder for tests); bump `schema_version` constant. + - Extend existing `surface_requests` tracker to also increment the new `retrieve_tools_calls_24h` counter when the builtin tool fires. +2. **`internal/server/mcp.go`** (or wherever MCP `initialize` is handled) — record observed `clientInfo.name` into `activation.mcp_clients_seen_ever` via the telemetry service. +3. **`cmd/mcpproxy-tray`** (Swift) + - On install / first launch: if macOS tray's login-item is not set, **set it ON by default**. Show a first-run dialog: "Launch at login" checked by default with a clear opt-out link. + - Expose the current login-item state via the socket so core can include it in `autostart_enabled`. +4. **Installer changes** + - macOS DMG: post-install script sets `MCPPROXY_LAUNCHED_BY=installer` and launches the tray once. + - Windows Inno Setup / WiX: final-step checkbox "Launch MCPProxy now" (default checked) that launches `mcpproxy-tray.exe` with the same env var. + - Linux `.deb`: no change (no tray today; opt-in systemd user unit already documented). + +## 8. Ground rules (non-negotiable) + +- **Anonymity preserved** — `anonymous_id` is still a random UUID generated locally, unchanged. +- **No PII added** — env-var detection is boolean-only (`has_ci_env`, not the value). No env var value, file path, username, hostname, or email ever leaves the client. IDE fingerprints use MCP `clientInfo.name` (a short enum-like string: `"claude-code"`, `"cursor"`, etc.), never user paths. +- **D1 backup mandatory** before any `ALTER TABLE` or backfill. Backup file is committed to a private repo. +- **PII audit of existing data** is part of the same spec — propose IP truncation + retention policy. +- **Telemetry remains opt-out** — no change to existing `MCPPROXY_TELEMETRY=false` escape hatch or the `telemetry disable` CLI. + +## 9. Success criteria + +1. Dashboard's "real human installs" count changes by ≤ 5 % when flipping from version-rule to `env_kind` filter (i.e. heuristic and ground truth agree). Large deltas are acceptable if explainable. +2. Activation funnel page is live and shows non-zero values for each step on v0.25+ rows within 7 days of release. +3. macOS tray installs on v0.25+ show `autostart_enabled=true` for ≥ 90 % of first heartbeats. +4. `launch_source=installer` appears on ≥ 50 % of new macOS installs' first heartbeat. +5. Zero validation-rejection errors on the worker over a 7-day window after v0.25 is the default. +6. Day-2 retention on macOS v0.25+ ≥ Mac v0.24 (78 %) — release is a no-regression + hopefully a lift. + +## 10. Verification plan (required for every PR) + +1. **Unit tests (Go)** — for every detection branch in `env_kind.go`, every activation-flag transition, every payload-v3 field. Run `go test -race ./internal/telemetry/...`. +2. **E2E (existing)** — `./scripts/test-api-e2e.sh` must pass. Add a new check: `mcpproxy code exec` or a direct heartbeat-builder test that asserts the v3 payload shape. +3. **curl** — start mcpproxy, inspect `/api/v1/status` and any debug endpoint that surfaces the payload; confirm the v3 fields render correctly. +4. **Chrome (`claude-in-chrome`)** — open the dashboard locally (`mcpproxy-dash`) after deploying the worker to preview, confirm new panels render, CI toggle behaves correctly. +5. **UI test MCP (`mcpproxy-ui-test`)** — screenshot the macOS tray after first-run to confirm the auto-start dialog renders + default checked; confirm tray menu shows state. +6. **Worker tests** — `vitest` must cover the new validation rules + backfill classifier. +7. **D1 restore drill** — take the backup, restore to a staging D1, verify row count matches. + +## 11. Sequencing + +Items 1-4 below are in dependency order; each can be its own PR. + +| # | Repo | PR | Blocked by | +|---|---|---|---| +| 1 | `mcpproxy-telemetry` | D1 backup + schema migration + worker validation + backfill script | — | +| 2 | `mcpproxy-go` | Payload v3 + env_kind + activation bucket + tray auto-start default + installer launch-step | 1 | +| 3 | `mcpproxy-dash` | Use env_kind, add activation funnel, add launch-source mix | 1 + 2 (so v3 rows exist to render) | +| 4 | `mcpproxy-telemetry` | PII audit follow-through — IP truncation, retention job | — (parallel with 1) | + +## 12. Open questions (answered inline — none remaining) + +All major design decisions confirmed in the 2026-04-24 brainstorm. diff --git a/specs/README.md b/specs/README.md new file mode 100644 index 00000000..97d1f94d --- /dev/null +++ b/specs/README.md @@ -0,0 +1,82 @@ +# Specs Index + +Every numbered directory under `specs/` is a feature specification produced with [GitHub spec-kit](https://github.com/github/spec-kit). This page is the canonical list; badges reflect `tasks.md` checklist progress and are a quick heuristic — not a guarantee. When ambiguous, cross-check `git log --grep=''` and the spec's `plan.md`. + +**Status legend** + +- `shipped` — ≥ 95 % of `tasks.md` items checked +- `in-flight` — 1–94 % checked +- `drafted` — spec/plan written, `tasks.md` empty or unchecked +- `—` — no `tasks.md` in the directory (doc-only spec or pre-speckit draft) + +## Related design docs + +Brainstormed design docs that feed future specs live under [`docs/superpowers/specs/`](../docs/superpowers/specs/): + +- [`2026-03-23-telemetry-and-feedback-design.md`](../docs/superpowers/specs/2026-03-23-telemetry-and-feedback-design.md) — MCPProxy Telemetry & Feedback — Design Spec +- [`2026-03-30-ci-swift-tray-build-design.md`](../docs/superpowers/specs/2026-03-30-ci-swift-tray-build-design.md) — Design: CI Build for Swift macOS Tray App + Installer Updates +- [`2026-04-24-diagnostics-error-taxonomy-design.md`](../docs/superpowers/specs/2026-04-24-diagnostics-error-taxonomy-design.md) — Diagnostics & error taxonomy deep-dive +- [`2026-04-24-retention-telemetry-hygiene-design.md`](../docs/superpowers/specs/2026-04-24-retention-telemetry-hygiene-design.md) — Retention telemetry hygiene + activation instrumentation + auto-start defaults +- [`macos-design-guide.md`](../docs/superpowers/specs/macos-design-guide.md) — MCPProxy macOS App Design Guide + +## Numbered specs + +| # | Title | Status | Progress | +| --- | --- | --- | --- | +| [001-code-execution](./001-code-execution/) | JavaScript Code Execution Tool for MCP Tool Composition | `drafted` | 0/127 (0%) | +| [001-fix-skipped-auth-tests](./001-fix-skipped-auth-tests/) | Fix Skipped API Key Authentication Tests | — | — | +| [001-oas-endpoint-documentation](./001-oas-endpoint-documentation/) | Complete OpenAPI Documentation for REST API Endpoints | `in-flight` | 49/69 (71%) | +| [001-oauth-scope-discovery](./001-oauth-scope-discovery/) | OAuth Scope Auto-Discovery | — | — | +| [001-update-version-display](./001-update-version-display/) | Update Check Enhancement & Version Display | `in-flight` | 11/58 (19%) | +| [002-windows-installer](./002-windows-installer/) | Windows Installer for MCPProxy | `in-flight` | 25/60 (42%) | +| [003-tool-annotations-webui](./003-tool-annotations-webui/) | Tool Annotations & MCP Sessions in WebUI | `in-flight` | 10/64 (16%) | +| [004-management-health-refactor](./004-management-health-refactor/) | Management Service Refactoring & OpenAPI Generation | `in-flight` | 45/101 (45%) | +| [005-rest-management-integration](./005-rest-management-integration/) | REST Endpoint Management Service Integration | `shipped` | 45/45 (100%) | +| [006-oauth-extra-params](./006-oauth-extra-params/) | OAuth Extra Parameters Support | `in-flight` | 31/65 (48%) | +| [007-oauth-e2e-testing](./007-oauth-e2e-testing/) | OAuth E2E Testing & Observability | `in-flight` | 88/103 (85%) | +| [008-oauth-token-refresh](./008-oauth-token-refresh/) | OAuth Token Refresh Bug Fixes and Logging Improvements | `in-flight` | 57/64 (89%) | +| [009-proactive-oauth-refresh](./009-proactive-oauth-refresh/) | Proactive OAuth Token Refresh & UX Improvements | `drafted` | 0/87 (0%) | +| [010-release-notes-generator](./010-release-notes-generator/) | Release Notes Generator | `in-flight` | 24/36 (67%) | +| [011-resource-auto-detect](./011-resource-auto-detect/) | Auto-Detect RFC 8707 Resource Parameter for OAuth Flows | `shipped` | 39/39 (100%) | +| [012-docusaurus-docs-site](./012-docusaurus-docs-site/) | Docusaurus Documentation Site | `in-flight` | 74/89 (83%) | +| [012-unified-health-status](./012-unified-health-status/) | Unified Health Status | `shipped` | 44/44 (100%) | +| [013-structured-server-state](./013-structured-server-state/) | Structured Server State | `shipped` | 46/46 (100%) | +| [013-tool-change-notifications](./013-tool-change-notifications/) | Subscribe to notifications/tools/list_changed for Automatic Tool Re-indexing | `in-flight` | 26/45 (58%) | +| [014-cli-output-formatting](./014-cli-output-formatting/) | CLI Output Formatting System | `shipped` | 65/66 (98%) | +| [015-server-management-cli](./015-server-management-cli/) | Server Management CLI | `shipped` | 50/50 (100%) | +| [016-activity-log-backend](./016-activity-log-backend/) | Activity Log Backend | `drafted` | 0/50 (0%) | +| [017-activity-cli-commands](./017-activity-cli-commands/) | Activity CLI Commands | `drafted` | 0/60 (0%) | +| [018-intent-declaration](./018-intent-declaration/) | Intent Declaration with Tool Split | `shipped` | 69/69 (100%) | +| [019-activity-webui](./019-activity-webui/) | Activity Log Web UI | `shipped` | 73/73 (100%) | +| [020-oauth-login-feedback](./020-oauth-login-feedback/) | OAuth Login Error Feedback | — | — | +| [021-request-id-logging](./021-request-id-logging/) | Request ID Logging | `in-flight` | 20/42 (48%) | +| [022-oauth-redirect-uri-persistence](./022-oauth-redirect-uri-persistence/) | OAuth Redirect URI Port Persistence | `shipped` | 24/25 (96%) | +| [023-oauth-state-persistence](./023-oauth-state-persistence/) | OAuth Token Refresh Reliability | `shipped` | 38/39 (97%) | +| [023-smart-config-patch](./023-smart-config-patch/) | Smart Config Patching | `shipped` | 52/53 (98%) | +| [024-expand-activity-log](./024-expand-activity-log/) | Expand Activity Log | `shipped` | 63/66 (95%) | +| [026-pii-detection](./026-pii-detection/) | Sensitive Data Detection | `shipped` | 130/130 (100%) | +| [027-status-command](./027-status-command/) | Status Command | `shipped` | 25/25 (100%) | +| [028-agent-tokens](./028-agent-tokens/) | Agent Tokens | `drafted` | 0/43 (0%) | +| [029-mcpproxy-teams](./029-mcpproxy-teams/) | MCPProxy Teams | `shipped` | 29/29 (100%) | +| [033-typescript-code-execution](./033-typescript-code-execution/) | TypeScript Code Execution Support | `drafted` | 0/19 (0%) | +| [034-expand-secret-refs](./034-expand-secret-refs/) | Expand Secret/Env Refs in All Config String Fields | `shipped` | 17/17 (100%) | +| [035-enhanced-annotations](./035-enhanced-annotations/) | Enhanced Tool Annotations Intelligence | — | — | +| [037-macos-swift-tray](./037-macos-swift-tray/) | Native macOS Swift Tray App (Spec A) | — | — | +| [038-mcp-accessibility-server](./038-mcp-accessibility-server/) | MCP Accessibility Testing Server (Spec C) | — | — | +| [039-connect-and-dashboard](./039-connect-and-dashboard/) | Connect Clients & Dashboard Visual Redesign | — | — | +| [039-scanner-qa-audit](./039-scanner-qa-audit/) | Security Scanner QA Audit & Fix | — | — | +| [039-security-scanner-plugins](./039-security-scanner-plugins/) | Security Scanner Plugin System | — | — | +| [040-server-ux](./040-server-ux/) | Add/Edit Server UX Improvements | `drafted` | 0/35 (0%) | +| [041-quarantine-invariants](./041-quarantine-invariants/) | Quarantine State Machine Invariants & Property Tests | — | — | +| [042-telemetry-tier2](./042-telemetry-tier2/) | Telemetry Tier 2 — Privacy-Respecting Usage Signals | `drafted` | 0/91 (0%) | +| [043-linux-package-repos](./043-linux-package-repos/) | Linux Package Repositories (apt/yum) | `shipped` | 39/41 (95%) | + +## Updating this index + +The index is not auto-generated. Refresh the table when you: + +- add a new numbered spec directory under `specs/` +- ship or abandon an existing spec (adjust the badge) +- add a design doc under `docs/superpowers/specs/` + +Future-you will thank present-you for a short PR update when the status actually changes, so the badges stay honest.