From 9d19eeecb31a2a0a5a5d48e850b36f1963581c65 Mon Sep 17 00:00:00 2001 From: Tyler Leonhardt Date: Fri, 15 May 2026 16:23:48 -0700 Subject: [PATCH 1/2] Claude agent host: roadmap status sync + Phase 8.5 (rich tool-call rendering) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Mark Phases 5, 6, 7, 8 as DONE (implementations have shipped; just catching the headings up to reality). - Insert Phase 8.5 — Rich tool-call rendering parity with Copilot. Today the Claude permission card for Bash reads 'Run shell command' with no command shown; Bash/Grep/Glob rows render in the generic renderer instead of the dedicated terminal/search renderers. This phase ports Copilot's getInvocationMessage / getPastTenseMessage / getToolKind / getShellLanguage / getToolInputString shape into claudeToolDisplay.ts and wires them through the permission, mapper, and replay paths. Phase 6.5 (Fork) intentionally stays Deferred. --- .../platform/agentHost/node/claude/roadmap.md | 104 +++++++++++++++++- 1 file changed, 100 insertions(+), 4 deletions(-) diff --git a/src/vs/platform/agentHost/node/claude/roadmap.md b/src/vs/platform/agentHost/node/claude/roadmap.md index 07dc22f68144a..4b3e9e448c063 100644 --- a/src/vs/platform/agentHost/node/claude/roadmap.md +++ b/src/vs/platform/agentHost/node/claude/roadmap.md @@ -444,7 +444,7 @@ round-trip correctly. Exit criteria: a workbench client sees the Claude provider listed, can pick a Claude model, but can't yet send a message. -### Phase 5 — Session lifecycle: create / dispose / list / shutdown +### Phase 5 — Session lifecycle: create / dispose / list / shutdown ✅ **DONE** Implement the lifecycle methods that don't require live LLM traffic. **Provisional / materialize is the load-bearing model in this phase** @@ -520,7 +520,7 @@ restarts find materialised sessions; externally-created Claude Code sessions appear; agent host can shut down cleanly. Fork is deferred to Phase 6.5. -### Phase 6 — `sendMessage` + streaming progress events (single-turn, no tools) +### Phase 6 — `sendMessage` + streaming progress events (single-turn, no tools) ✅ **DONE** Wire the proxy + SDK from Phase 3 into a real session. **Port the lifecycle machinery from `claudeCodeAgent.ts`:** @@ -714,7 +714,7 @@ with restored sessions, and honors the workbench's "keep `[0..N]` INCLUSIVE" semantic. The reverted heuristic is **not** retained behind a flag. -### Phase 7 — Tool calls + permission + user input +### Phase 7 — Tool calls + permission + user input ✅ **DONE** Wire the SDK's tool-use loop through to the agent host's tool infrastructure. **Transcript-only in this phase** — file edit tracking is Phase 8. @@ -777,7 +777,7 @@ Exit criteria: a real "read this file" prompt completes end-to-end. — see roadmap.md` marker at the call site so the upgrade path stays discoverable. Implement when Phase N introduces multi-action plan UX. -### Phase 8 — File edit tracking +### Phase 8 — File edit tracking ✅ **DONE** Build the Claude analog of `fileEditTracker.ts` from `node/copilot/`. @@ -803,6 +803,102 @@ client-side accept of one and reject of the other behaves correctly. Exit criteria: file diffs render in the workbench; per-file accept/reject works. +### Phase 8.5 — Rich tool-call rendering parity with Copilot + +Claude's tool-call cards today only carry the static display name from +[`claudeToolDisplay.ts`](./claudeToolDisplay.ts) (`"Run shell command"`, +`"Find files"`, ...). Copilot's [`copilotToolDisplay.ts`](../copilot/copilotToolDisplay.ts) +formats the actual `tool_use.input` into the row title and tags the row +with a `toolKind` so the workbench renders terminal / search / +subagent specially. Phase 12 already laid the `_meta.toolKind: +'subagent'` half down; this phase finishes the parity for the rest of +the SDK's built-in tools. + +Gap surfaced live: a `Bash` permission card reads *"Run shell command"* +with no command attached, and `Bash` / `Grep` / `Glob` rows render in +the generic tool renderer instead of the dedicated terminal / search +renderers. + +Scope: + +- **Port the Copilot helper shape** into + [`claudeToolDisplay.ts`](./claudeToolDisplay.ts), keyed off the SDK's + `tool_use.input` schemas: + - `getClaudeInvocationMessage(toolName, displayName, input)` → + markdown that includes the actual params (`` Running `git status` ``, + `Reading [src/foo.ts](src/foo.ts)`, `` Searching for `pattern` ``, + `Fetching [https://...](https://...)`). + - `getClaudePastTenseMessage(toolName, displayName, input, success)` → + success/failure-aware past-tense (`` Ran `git status` ``, + `Read foo.ts`, `Searched for ...`); replaces the + `" finished"` hardcode at + [`claudeMapSessionEvents.ts:332`](./claudeMapSessionEvents.ts#L332). + - `getClaudeToolKind(toolName)` → `'terminal' | 'subagent' | + 'search' | undefined`. `Bash` / `BashOutput` / `KillBash` → + `'terminal'`; `Grep` / `Glob` → `'search'`; `Task` → + `'subagent'` (Phase 12 already does this; consolidate the call + site). + - `getClaudeShellLanguage(toolName)` → `'bash'` for the shell tools + (drives terminal renderer's syntax highlighting). + - `getClaudeToolInputString(toolName, input)` → the canonical + "input as code" string used for the code block under the row + (e.g. the multi-line `command` for `Bash`, the formatted + arguments for the rest). + - Per-tool input typings live alongside the helpers + (`IClaudeBashInput`, `IClaudeGrepInput`, ...), validated + defensively (Claude's input can be malformed across SDK + versions — fall back to the static display name on shape + mismatch). +- **Wire the helpers through both code paths**: + - [`claudeCanUseTool.ts`](./claudeCanUseTool.ts) — set + `invocationMessage` on `pending_confirmation` from the rich + helper so the **permission card shows the actual command / + file / pattern**, not just the display name. Add `toolKind` and + `language` to the signal so the card uses the terminal renderer + when relevant. + - [`claudeMapSessionEvents.ts`](./claudeMapSessionEvents.ts) — + set `invocationMessage` on `SessionToolCallReady` for the + non-interactive (auto-approved) path, set `pastTenseMessage` on + `SessionToolCallComplete`, and emit `_meta.toolKind` / + `_meta.language` on the `tool_use` block alongside the existing + `_meta.toolKind: 'subagent'` (single canonical path; Phase 12's + spawn helpers consume the same field). + - **Replay path** — `claudeReplayMapper.ts` writes the same + `_meta.toolKind` / `_meta.language` and rich + invocation/past-tense on historical `tool_use` blocks so + restored sessions render identically to live ones. +- **Snapshot test** in `claudeToolDisplay.test.ts` covering each tool + row × `{ invocation, pastTense, toolKind, language, inputString }`. + Mirrors the existing display-name snapshot so any new SDK tool + added to the `TOOL_ROWS` table forces a snapshot update. + +Tests: + +- Unit: snapshot table covers every tool; `getClaudeInvocationMessage` + defends against malformed input shapes and falls back cleanly. +- Integration: an interactive `Bash` request → the + `pending_confirmation` signal carries the command in + `invocationMessage` and `_meta.toolKind: 'terminal'`; the same flow + on completion emits a past-tense message that includes the command. + +Manual E2E: + +- Live: ask the Claude agent to run a shell command. The permission + card should render in the **terminal** style with the command + highlighted; the card should read *Running `git status`* (or + similar) instead of *Run shell command*. After approval the row + collapses to *Ran `git status`*. Same for `Grep` / `Glob` + rendering in the search style. +- Replay: open a historical Claude session that contains shell and + search tool calls. The historical rows should render in the same + terminal / search style as the live rows. + +Exit criteria: Claude tool-call cards (live and replayed) match +Copilot's rendering quality — permission cards show the actual +invocation, terminal tools render in the terminal renderer, search +tools render in the search renderer. Adding a new SDK tool means +adding one row to `TOOL_ROWS` and updating the snapshot test. + ### Phase 9 — Abort + steering + model change + shutdown polish ✅ **DONE** Implementation contract: [phase9-plan.md](./phase9-plan.md). Unit tests From a6bc47cc8f2d67b75d233397e08b39c28522d9a7 Mon Sep 17 00:00:00 2001 From: Tyler Leonhardt Date: Sat, 16 May 2026 00:15:58 -0700 Subject: [PATCH 2/2] Claude agent host: Phase 8.5 plan (super-planner + grilling) Adds phase8.5-plan.md alongside roadmap.md's Phase 8.5 section. Synthesized from a 3-model council (GPT-5.5, Claude Opus 4.6, GPT-5.3-Codex) and refined through a grill-with-docs session. Locked decisions: - D1: getClaudeToolKind is a TOOL_ROWS column (single source of truth). - D2: add Agent row to TOOL_ROWS; delete SUBAGENT_TOOL_NAMES. - D3: defensive Record access; no per-tool exported types. - D4: pastTenseMessage is success-aware (tool_result.is_error). - D5: live mapper mirrors Copilot's stash-on-start/reuse-on-complete pattern, with state encapsulated in a new ClaudeToolCallRegistry class (replaces today's bare maps on ClaudeMapperState). - D6: _meta single-write on Start; reducer carries to Complete; replay emits on its single terminal action (asymmetry by design). - D7: MCP tools get toolKind: undefined. - D8: one big per-tool snapshot table covering all 5 helpers. - D9: getClaudeInvocationMessage('Task', ...) owns the Task description fallback; Phase 12's site reduces to a plain helper call. - D10: _meta stays flat (no per-kind namespacing). Steps cover claudeToolDisplay (helpers + columns), claudeCanUseTool (permission card rich invocation + _meta), sessionPermissions (forward _meta through pending->ready), claudeMapSessionEvents (registry migration + _meta on Start + success-aware past tense), claudeReplayMapper (parity), claudeSubagentSignals (inner tool parity), and the snapshot + behavior tests. --- .../agentHost/node/claude/phase8.5-plan.md | 617 ++++++++++++++++++ 1 file changed, 617 insertions(+) create mode 100644 src/vs/platform/agentHost/node/claude/phase8.5-plan.md diff --git a/src/vs/platform/agentHost/node/claude/phase8.5-plan.md b/src/vs/platform/agentHost/node/claude/phase8.5-plan.md new file mode 100644 index 0000000000000..fd55cd3b59750 --- /dev/null +++ b/src/vs/platform/agentHost/node/claude/phase8.5-plan.md @@ -0,0 +1,617 @@ +# Phase 8.5 — Rich tool-call rendering parity with Copilot + +> Generated by super-planner. Source: [roadmap.md](./roadmap.md) (Phase 8.5). +> Last updated: 2026-05-15 after 3-model council (GPT-5.5, Claude Opus 4.6, +> GPT-5.3-Codex). Pre-grilling draft. + +**Status:** ready (pending grill-with-docs pass) + +## Goal + +Bring Claude tool-call rendering up to the quality bar Copilot already +sets: permission cards and tool rows show the **actual** invocation +parameters (the command, the file, the search pattern), and shell / +search tools render in the workbench's dedicated terminal / search +renderers instead of the generic tool renderer. Both live and replayed +sessions must produce identical signals. + +## Scope + +**In scope** + +- Five new exported helpers in `claudeToolDisplay.ts` mirroring + Copilot's shape: `getClaudeInvocationMessage`, + `getClaudePastTenseMessage`, `getClaudeToolKind`, + `getClaudeShellLanguage`, `getClaudeToolInputString`. +- Two new columns on the existing `TOOL_ROWS` table: `toolKind`, + `language`. Source-of-truth lookup for the kind/language helpers. +- One small action-meta helper, `buildClaudeToolMeta(toolName)` → + `{ toolKind?, language? }` (omits keys when undefined), called at + every tool-open seam. +- Wire the helpers through: + - `claudeCanUseTool.ts` — rich `invocationMessage`, rich + `toolInput`, `_meta` on `ToolCallPendingConfirmationState`. + - `claudeMapSessionEvents.ts` — `_meta` on `SessionToolCallStart`, + success-aware `pastTenseMessage` and `_meta` on + `SessionToolCallComplete`. Adds tool-input accumulation to + `ClaudeMapperState` (see Decision D5). + - `claudeReplayMapper.ts` — same helpers in `_openToolUse` / + `_attachToolResult`. Subsumes the existing Phase 12 + subagent-only `_meta` write. + - `claudeSubagentSignals.ts` — same `_meta` on inner subagent + tool-call start / ready (currently bare). + - `sessionPermissions.ts` — copy `state._meta` into the emitted + `SessionToolCallReady` action so `_meta` survives the + pending_confirmation → ready transition. +- Snapshot test in `claudeToolDisplay.test.ts` covering every tool + row × `{ invocation, pastTenseSuccess, pastTenseFailure, toolKind, + language, inputString }`. +- Live + replay assertions extended in + `claudeMapSessionEvents.test.ts` and `claudeReplayMapper.test.ts`. + +**Out of scope** + +- New SDK tools beyond what `TOOL_ROWS` already covers — adding tools + is a one-row update, not a phase. +- Workbench-side renderers — terminal/search rendering already exists + and is driven by the `_meta.toolKind` field Copilot already sets. + Phase 8.5 only emits the meta; rendering is the workbench's job. +- Bash-tool file-edit tracking — owned by Phase 8 (already documented + as a known gap). +- Reworking the localization story — strings remain `nls.localize()` + with `{0}` placeholders. + +## Prerequisites + +- Phase 7 (tool calls + permission + user input) — landed. Provides + `claudeCanUseTool.ts`, `INTERACTIVE_CLAUDE_TOOLS`, and the + pending-confirmation state shape. +- Phase 8 (file edit tracking) — landed. Stable `TOOL_ROWS` table and + `isClaudeFileEditTool` helper. +- Phase 12 (subagents) — landed. Established the `_meta.toolKind: + 'subagent'` precedent that Phase 8.5 generalizes. +- Phase 13 (session restoration) — landed. Provides the replay path + (`claudeReplayMapper.ts`). + +## Approach + +Add `toolKind` and `language` columns to the existing `TOOL_ROWS` +table in `claudeToolDisplay.ts` (single source of truth), then add +five helper functions that mirror Copilot's `copilotToolDisplay.ts` +shape but read Claude SDK `tool_use.input` schemas defensively. A +single tiny `buildClaudeToolMeta(toolName)` helper produces the +`_meta` bag stamped at every tool-open seam — live mapper, permission +path, replay mapper, and inner subagent path — so all three drivers +emit identical signals. The `pastTenseMessage` helper is +success-aware (`tool_result.is_error`). Inputs flow into the live +mapper via accumulation of `input_json_delta` chunks (see Decision +D5). + +## Steps + +1. **Extend `TOOL_ROWS` and add helpers in `claudeToolDisplay.ts`.** + - Add `toolKind?: 'terminal' | 'subagent' | 'search'` and + `language?: string` columns to `ClaudeToolRow`. Populate: + - `Bash`, `BashOutput`, `KillBash` → `toolKind: 'terminal'`, + `language: 'bash'`. + - `Grep`, `Glob` → `toolKind: 'search'`. + - `Task` → `toolKind: 'subagent'`. + - All others → omit (keeps `undefined`). + - Export `getClaudeToolKind(name)`, `getClaudeShellLanguage(name)` + as one-liner table lookups. + - Export `getClaudeInvocationMessage(name, displayName, input)` + and `getClaudePastTenseMessage(name, displayName, input, + success)` returning `StringOrMarkdown`. Mirror Copilot's + branches in `copilotToolDisplay.ts` lines 473–760, keyed off + Claude tool names: `Bash` (`command`), `Read`/`Write`/`Edit`/ + `MultiEdit`/`NotebookRead`/`NotebookEdit` (`file_path` / + `notebook_path`), `Grep`/`Glob` (`pattern`), `WebFetch` + (`url`), `Task` (`description`), MCP (`server: tool`). + - Export `getClaudeToolInputString(name, input)` returning the + "code under the row" string: shell tools → `command`; others → + `JSON.stringify(input, null, 2)`. + - Export `buildClaudeToolMeta(name)` returning + `{ toolKind?, language? }` with undefined keys omitted. + - All accessors validate field types defensively (e.g. `typeof + command === 'string'`); on mismatch fall back to the static + display name. **No `as` casts that escape the type checker.** + - Files: [`claudeToolDisplay.ts`](./claudeToolDisplay.ts). + - Depends on: none. + - Done when: snapshot test (Step 6) lists every tool row × all + five helper outputs. + +2. **Wire `claudeCanUseTool.ts` (live permission path).** + - In `dispatchCanUseTool` (lines 105–135), replace + `invocationMessage: displayName` with + `getClaudeInvocationMessage(toolName, displayName, input)`. + - Replace the raw stringify of `input` (used for `toolInput`) + with `getClaudeToolInputString(toolName, input) ?? + `. + - Add `_meta: buildClaudeToolMeta(toolName)` on the + `ToolCallPendingConfirmationState`. + - Files: [`claudeCanUseTool.ts`](./claudeCanUseTool.ts). + - Depends on: Step 1. + - Done when: Bash permission card carries the actual command + (verified by integration test added in Step 7). + +3. **Patch `sessionPermissions.ts` to forward `_meta` into Ready.** + - In `createToolReadyAction` (lines ~175–201), copy `state._meta` + onto the emitted `SessionToolCallReady` action so the + `pending_confirmation → ready` transition preserves + `toolKind` / `language`. + - This file is **outside** `node/claude/`; the change is + agnostic and will help any future agent that puts `_meta` on a + pending-confirmation state. + - Files: + [`sessionPermissions.ts`](../sessionPermissions.ts). + - Depends on: Step 1 (so the type of `_meta` is settled). + - Done when: a permission card flipped to ready keeps its + `_meta` (asserted in + [`sessionPermissions.test.ts`](../../test/node/sessionPermissions.test.ts) + if a fixture exists; otherwise add a focused test). + +4. **Wire `claudeMapSessionEvents.ts` (live event path) and extract + `ClaudeToolCallRegistry`.** + - Create new file + [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) + containing the `ClaudeToolCallRegistry` class described in + Decision D5. It owns per-tool-call state for the live mapper + (`turnId`, `toolName`, delta buffer, parsed input, computed + `IClaudeToolStartInfo` bag). Public API: + `beginToolCall`, `appendInputDelta`, `completeToolCall`, + `lookupToolCall`, `forgetToolCall`, `drainOnTurnEnd`. + - Delete the bare `_toolCallTurnIds` / `_toolCallNames` / + `_activeToolBlocks` fields on `ClaudeMapperState` and migrate + their callers to the registry. The mapper stops holding + cross-message state directly. + - `ClaudeAgentSession` constructs and owns one + `ClaudeToolCallRegistry` per session (registered via + `_register`), and passes it into the mapper as a constructor + argument. + - In `mapStreamEvent`: + - `content_block_start` for `tool_use` (~line 416) → + `registry.beginToolCall(toolUseId, toolName, turnId)`. + - `input_json_delta` (~line 182's comment block describes the + block-index correlation) → + `registry.appendInputDelta(toolUseId, delta.partial_json)`. + - `content_block_stop` for the same block → + `const info = registry.completeToolCall(toolUseId)`, then + emit `SessionToolCallStart` with + `info.invocationMessage`, `info.toolInput`, + `_meta: buildClaudeToolMeta(info.toolName)`. + - In `mapUserMessage` for `tool_result` (~lines 277–291): + - `const info = registry.lookupToolCall(toolUseId)`. + - Emit `SessionToolCallComplete` with + `pastTenseMessage: getClaudePastTenseMessage(info.toolName, + info.displayName, info.parsedInput, !isError)`. No `_meta` + — the reducer carries `_meta` forward from Start (D6). + - Call `registry.forgetToolCall(toolUseId)` after emission. + - On `info === undefined` (defense-in-depth): fall back to + the existing static `${displayName} finished` and no + `_meta`; log a trace warning. + - In `markTurnComplete` (~lines 140–157) → delegate to + `registry.drainOnTurnEnd(turnId, logService)`. + - Files: + [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) (new), + [`claudeMapSessionEvents.ts`](./claudeMapSessionEvents.ts), + [`claudeAgentSession.ts`](./claudeAgentSession.ts) (construct + and inject the registry). + - Depends on: Step 1. + - Done when: + - A live `Bash` turn emits `pastTenseMessage` containing the + command (verified in + [`claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts)). + - `_meta.toolKind: 'terminal'` appears on both + `SessionToolCallStart` and `SessionToolCallComplete`. + - New file + `claudeToolCallRegistry.test.ts` covers begin/append/complete/ + lookup/forget/drain in isolation (mirrors + `claudeSubagentRegistry.test.ts`). + +5. **Wire `claudeReplayMapper.ts` (replay path).** + - In `_openToolUse` (lines ~257–281): + - Parse `block.input` (already a `Record`). + - Replace `invocationMessage: displayName` with + `getClaudeInvocationMessage(toolName, displayName, input)`. + - Replace the existing `_meta: { toolKind: 'subagent' }` + (set only when `SUBAGENT_TOOL_NAMES.has(toolName)`) with + `_meta: buildClaudeToolMeta(toolName)` — this subsumes the + Phase 12 path (since `getClaudeToolKind('Task') === + 'subagent'`). + - Stash the parsed input in `ReplayBuilder._toolCallInputs: + Map>`. + - In `_attachToolResult` (lines ~282–320): + - Use `getClaudePastTenseMessage(toolName, displayName, + stashedInput, !isError)` instead of + ``${displayName} finished``. + - The existing `..._meta` carry-forward keeps the new full + `_meta` intact. + - Verify `Agent` (legacy SDK alias for Task subagent spawning, + per `SUBAGENT_TOOL_NAMES` at lines ~131–137) is handled. Two + options: + - **Option A**: add `Agent` row to `TOOL_ROWS` with + `toolKind: 'subagent'` (keeps replay regressions away). + - **Option B**: leave `SUBAGENT_TOOL_NAMES` as the + replay-only fallback; document it. + **Decision D2**: Option A — single source of truth. + - Files: + [`claudeReplayMapper.ts`](./claudeReplayMapper.ts). + - Depends on: Step 1. + - Done when: a replayed `Bash` row in + [`claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) + shows the same rich invocation / pastTense / `_meta` as the + live row from Step 4. + +6. **Wire `claudeSubagentSignals.ts` (inner subagent path).** + - In `emitInnerAssistantSignals` (lines ~250–270), the inner + `tool_use` branch: + - Stamp `_meta: buildClaudeToolMeta(block.name)` on the + inner `SessionToolCallStart`. + - Replace `invocationMessage: displayName` on the synthetic + inner Ready with + `getClaudeInvocationMessage(block.name, displayName, + block.input)`. + - In `buildTopLevelSubagentReadyAction` (lines ~144–170), the + existing `_meta: { toolKind: 'subagent', ... }` already + produces the right kind. Re-express via + `{ ...buildClaudeToolMeta('Task'), + subagentDescription: ..., subagentAgentName: ... }` to use + the same helper (no behavioral change). + - Files: + [`claudeSubagentSignals.ts`](./claudeSubagentSignals.ts). + - Depends on: Step 1. + - Done when: an inner Glob/Read row inside a subagent renders + with `toolKind: 'search'` / no `toolKind` (verified in + [`claudeSubagentSignals.test.ts`](../../test/node/claudeSubagentSignals.test.ts)). + +7. **Snapshot + behavior tests.** + - Extend + [`claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) + with one snapshot test mapping every tool row to + `[invocationMessage, pastTenseSuccess, pastTenseFailure, + toolKind, language, inputString]` given a representative + input bag. Mirrors the existing + `[permissionKind, displayName]` snapshot at lines ~31–50. + - Add focused defense tests: malformed input falls back; MCP + tool gets `undefined` toolKind + JSON input string fallback. + - Add live assertions in + [`claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts): + `Bash` start carries `_meta.toolKind: 'terminal'` / + `language: 'bash'`; complete carries success-aware past tense + containing the command. + - Add replay assertions in + [`claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) + for the same. + - Add Ready-action `_meta` forwarding test in + `sessionPermissions.test.ts` (or add the file if absent). + - Files: see above. + - Depends on: Steps 1–6. + - Done when: `npm run compile-check-ts-native` clean, + `valid-layers-check` clean, all agentHost tests green. + +## Files to Modify or Create + +| Path | Change | Notes | +|------|--------|-------| +| [`claudeToolDisplay.ts`](./claudeToolDisplay.ts) | modify | Add `toolKind`/`language` cols; add 5 helpers + `buildClaudeToolMeta`. | +| [`claudeCanUseTool.ts`](./claudeCanUseTool.ts) | modify | Use rich helpers in `dispatchCanUseTool`. | +| [`claudeMapSessionEvents.ts`](./claudeMapSessionEvents.ts) | modify | Migrate cross-message state to `ClaudeToolCallRegistry`; emit `_meta` on Start; success-aware past tense + `_meta` on Complete. | +| [`claudeToolCallRegistry.ts`](./claudeToolCallRegistry.ts) | create | New class owning per-tool-call state for the live mapper. | +| [`claudeAgentSession.ts`](./claudeAgentSession.ts) | modify | Construct one `ClaudeToolCallRegistry` per session; inject into the mapper. | +| [`claudeReplayMapper.ts`](./claudeReplayMapper.ts) | modify | Use rich helpers in `_openToolUse` / `_attachToolResult`; stash parsed input. | +| [`claudeSubagentSignals.ts`](./claudeSubagentSignals.ts) | modify | Same helpers on inner subagent path. | +| [`../sessionPermissions.ts`](../sessionPermissions.ts) | modify | Copy `state._meta` into emitted `SessionToolCallReady`. | +| [`../../test/node/claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) | modify | Per-tool composite snapshot covering all 5 helpers. | +| `../../test/node/claudeToolCallRegistry.test.ts` | create | Unit tests for begin/append/complete/lookup/forget/drain. | +| [`../../test/node/claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts) | modify | Live `_meta` + rich past-tense assertions. | +| [`../../test/node/claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) | modify | Replay parity assertions. | +| `../../test/node/sessionPermissions.test.ts` | modify or create | Assert `_meta` survives pending → ready. | + +## Decisions + +- **D1 — `getClaudeToolKind` is a `TOOL_ROWS` column, not a switch.** + The table is already the single source of truth for per-tool data + (`isClaudeFileEditTool`, `INTERACTIVE_CLAUDE_TOOLS`, + `getClaudePermissionKind` all read it). Adding a switch would + diverge on new tools and the snapshot test wouldn't catch the + drift. + +- **D2 — Add `Agent` to `TOOL_ROWS` (subagent-spawning, same as + `Task`).** SDK ships **two** subagent-spawning tools: `Task` + (built-in subagents, `sdk.d.ts:95`) and `Agent` (custom + subagents, `sdk.d.ts:36`). Replay already recognises both via + the local `SUBAGENT_TOOL_NAMES` set in + [`claudeReplayMapper.ts:137`](./claudeReplayMapper.ts#L137); + Phase 13's plan documented this with SDK citations. Add an + `Agent` row to `TOOL_ROWS` with + `permissionKind: 'custom-tool'`, `toolKind: 'subagent'`, then + delete `SUBAGENT_TOOL_NAMES` and switch the replay call site at + [`claudeReplayMapper.ts:261`](./claudeReplayMapper.ts#L261) to + `getClaudeToolKind(name) === 'subagent'`. The live path picks + this up automatically because `getClaudeToolKind` is the single + source of truth (D1). + +- **D3 — Defensive `Record` access; no per-tool + exported types.** The SDK boundary is `Record` / + `unknown` (verified at `sdk.d.ts:146` and `sdk.d.ts:3059`). + Per-tool interfaces would be documentation-only and risk + encouraging unsafe `as` casts. Helpers narrow each field with + `typeof` checks and fall back on mismatch. Local-only readability + interfaces are allowed inside `claudeToolDisplay.ts` (not + exported) if they make the helper bodies easier to read. + +- **D4 — `pastTenseMessage` is success-aware.** Signature is + `(name, displayName, input, success)` matching Copilot. Source of + the success bit: `tool_result.is_error === true` → + `success = false`. Already computed at + `claudeMapSessionEvents.ts:267` and `claudeReplayMapper.ts:295`. + +- **D5 — Live mapper mirrors Copilot's "stash on start, reuse on + complete" pattern (Option D), with state encapsulated in a small + class — not as bare maps on `ClaudeMapperState`.** CopilotAgent + computes the full `IToolStartInfo` at `tool.execution_start` (where + the Copilot SDK hands over complete args) and stashes it in + `builder.pendingTools: Map`; at + `tool.execution_complete` it looks the info back up and calls + `getPastTenseMessage(info.toolName, info.displayName, + info.parameters, success)` + ([`mapSessionEvents.ts:135–202`](../copilot/mapSessionEvents.ts#L135-L202), + [`mapSessionEvents.ts:622`](../copilot/mapSessionEvents.ts#L622)). + + Claude's analog: the SDK delivers tool input as `input_json_delta` + chunks across `content_block_start` → `delta*` → + `content_block_stop`. **`content_block_stop` for a `tool_use` block + IS Claude's "execution_start equivalent"** — by then the buffer is + complete and parseable. + + Introduce a new class `ClaudeToolCallRegistry` in + `claudeToolCallRegistry.ts` that owns all per-tool-call state for + the live mapper. It replaces today's two bare maps + (`_toolCallTurnIds`, `_toolCallNames`) and adds the new + input-accumulation state. Public API: + + - `beginToolCall(toolUseId, toolName, turnId)` — called from + `content_block_start`. Allocates the delta buffer. + - `appendInputDelta(toolUseId, partialJson)` — called from + `input_json_delta`. + - `completeToolCall(toolUseId) → IClaudeToolStartInfo` — called + from `content_block_stop`. Parses the buffer (try/catch), runs + the display helpers once, stashes the resulting info bag, + drops the buffer. Returns the bag so the mapper can use it on + the matching Start emission. + - `lookupToolCall(toolUseId) → IClaudeToolStartInfo | undefined` + — called from `mapUserMessage` when `tool_result` lands. + Returns the stashed info for past-tense / `_meta` use. + - `forgetToolCall(toolUseId)` — called after `tool_result` + consumes the entry, to bound memory. + - `drainOnTurnEnd(turnId, logService)` — called from + `markTurnComplete`, replaces today's inline cleanup at + `claudeMapSessionEvents.ts:140–157`, warns on orphan + `tool_use` blocks. + + The registry is owned by `ClaudeAgentSession` (one per session, + registered via `_register` so disposal flows through the + session). The mapper takes it as a constructor argument instead + of carrying state itself, so the mapper stays a pure + event-translation function and the registry can be unit-tested + in isolation (mirrors how Phase 12 split `SubagentRegistry` out + of the mapper). + + Replay is unaffected — `block.input` is already complete in the + replay path. `ClaudeReplayMapper` keeps its own + `Map>` stash internally; it + does not share the live registry because the lifetimes and + failure modes are entirely different (one-shot reconstruction + vs. streaming). + +- **D6 — `_meta` write asymmetry between live and replay is by + design.** Both paths end up producing the same final reducer + state; they differ only in the event shape that gets there. + + **Live (streaming, two-action shape):** the SDK emits + `content_block_start` → `input_json_delta*` → `content_block_stop` + → (user approves) → `tool_result` over time. The host translates + into two actions: `SessionToolCallStart` (stamped with + `_meta: buildClaudeToolMeta(toolName)` once) and + `SessionToolCallComplete` (no `_meta`). The reducer at + [`state/protocol/reducers.ts:30, 406`](../../common/state/protocol/reducers.ts) + carries `_meta` forward when merging Complete into the existing + per-`toolCallId` state. Mirrors Copilot's single-write at + [`mapSessionEvents.ts:197`](../copilot/mapSessionEvents.ts#L197). + + **Replay (flattened, single-action shape):** there is no event + stream — `claudeReplayMapper.ts` walks the on-disk transcript + and synthesizes one terminal `ToolCallState` per `(tool_use, + tool_result)` pair (per CONTEXT M7 "no live lifecycle states"). + `_meta` is stamped on that single action because there is no + prior state for the reducer to carry it from. + + **Permission cards (Step 3):** the one place this falls down + today is `sessionPermissions.createToolReadyAction` at + [`sessionPermissions.ts:175–201`](../sessionPermissions.ts#L175-L201), + which does NOT copy `_meta` from the pending-confirmation state + into the emitted `SessionToolCallReady`. Step 3 patches that + one site so the live single-write keeps working through the + `pending_confirmation → ready` transition. + + Future readers: do not "fix" the asymmetry by emitting `_meta` + on live Complete or by splitting replay into two actions. The + final reducer state is identical in both paths; only the event + shape differs. + +- **D10 — `_meta` stays flat (no per-kind namespacing).** Phase + 12's existing `_meta: { toolKind: 'subagent', + subagentDescription, subagentAgentName }` shape is the + precedent, and Copilot's `_meta` is also flat. New keys + (`language` for terminal, future keys for other kinds) sit as + optional siblings on the same object. Workbench readers + consume the keys they know about and ignore the rest. + Namespacing later (`_meta.terminal = { language }`) would + require migrating on-disk replay shape and a double-shape + workbench reader; the tidiness win does not justify either + cost. If `_meta` ever genuinely outgrows flat, it is its own + roadmap-level refactor. + +- **D9 — `getClaudeInvocationMessage('Task', …)` is the single + source of truth for the parent Task row's invocation message.** + The helper reads `input.description` (the model-authored + one-liner for the subagent task) and falls back to the static + display name. Phase 12's call site in + [`claudeSubagentSignals.ts:163`](./claudeSubagentSignals.ts#L163) + — currently `description ?? getClaudeToolDisplayName(block.name)` + — is replaced with a plain + `getClaudeInvocationMessage('Task', displayName, block.input)` + call. One code path; the snapshot test covers the + description-present and description-absent shapes. + +- **D7 — MCP tools get `toolKind: undefined`.** Mirrors Copilot's + behavior (`copilotToolDisplay.ts:807` returns one of the three + kinds or undefined). MCP input string defaults to JSON + pretty-print; invocation message strips the `mcp__` prefix and + shows `server: tool`. + +- **D8 — Snapshot test is one big per-tool table.** One + `assert.deepStrictEqual` mapping every tool name to a tuple of + all five helper outputs. Mirrors the existing snapshot pattern + at `claudeToolDisplay.test.ts:31–50`. New tools force one row + edit + one snapshot update — easy to review. + +## Risks + +- **R1 — `_meta` lost in pending → ready transition.** Mitigation: + Step 3 patches `sessionPermissions.ts`; covered by a focused + test. Without this fix, terminal/search rendering on permission + cards would silently break even though everything else looks + right. + +- **R2 — Live input accumulation race.** `input_json_delta` chunks + may arrive out of order with the canonical `assistant` envelope + for fast tools. Mitigation: D5's "first writer wins" rule plus + `try/catch` fallback to the static past tense. + +- **R3 — `StringOrMarkdown` downstream serialization.** + `invocationMessage` accepts `StringOrMarkdown`, but if a path + stringifies it expecting plain `string` we'd render + `[object Object]`. Mitigation: audit + `stateToProgressAdapter.ts` and `sessionPermissions.ts` arms + during Step 3; the reducer types should already enforce the + union correctly. + +- **R4 — Backwards compatibility on replayed sessions.** Existing + on-disk transcripts have `_meta: undefined` or `_meta: { toolKind: + 'subagent' }`. After this change they'll get richer `_meta`. This + is purely additive (no fields removed, no semantics changed) so + the workbench should accept it. Verified in Step 7's replay + assertions. + +- **R5 — Localization.** All new user-visible strings (invocation + + past-tense markdown) must use `nls.localize()` with `{0}` + placeholders, not concatenation. Coding guidelines enforced; + Copilot's helper is the template. + +## Verification + +### Unit / Integration + +- Unit: snapshot test in + [`claudeToolDisplay.test.ts`](../../test/node/claudeToolDisplay.test.ts) + covers every tool row × all five helpers. Defense tests cover + malformed input + MCP fallback. +- Live: assertions in + [`claudeMapSessionEvents.test.ts`](../../test/node/claudeMapSessionEvents.test.ts) + for `_meta` on Start, success-aware past tense + `_meta` on + Complete. +- Replay: assertions in + [`claudeReplayMapper.test.ts`](../../test/node/claudeReplayMapper.test.ts) + for parity with the live path. +- Subagent inner: assertions in + [`claudeSubagentSignals.test.ts`](../../test/node/claudeSubagentSignals.test.ts) + for `_meta` on inner tool calls. +- Permission forwarding: assertion in + `sessionPermissions.test.ts` (add file if absent) that + `state._meta` is copied into `SessionToolCallReady`. +- Run with: `runTests` (preferred) or `scripts/test.sh --grep + "agentHost"`. +- TS validity: `npm run compile-check-ts-native` and + `npm run valid-layers-check`. + +### E2E + +Available skills in this workspace: + +- **launch** — Playwright/CDP-driven Code OSS automation. Used + for the Phase 12 live E2E captured in [`phase12-plan.md`](./phase12-plan.md) + (Step 14). +- **code-oss-logs** — find and read dev-build renderer / agentHost + logs. Used to assert no `[Claude]` warn-logs during the + scenario. + +**Scenario**: + +1. Use `launch` to start the Agents window + (`./scripts/code.sh --agents`), switch the agent picker to + **Claude**, and send: *"Run `git status` and tell me what's + dirty."* +2. Verify the **Bash permission card** renders in the terminal + style (monospace block, syntax-highlighted as `bash`) and + reads *Running `git status`* — **not** *Run shell command*. +3. Approve. Verify the row collapses to *Ran `git status`* + (success-aware past tense from D4). +4. Send: *"Search for `IClaudeAgentSession` in this repo."* Verify + the `Grep` row renders in the **search** style (workbench's + search renderer, not generic) and reads *Searching for + `IClaudeAgentSession`*. After completion the row reads + *Searched for `IClaudeAgentSession`*. +5. Use `code-oss-logs` to read the agentHost log; assert no + `[Claude]` warn-logs for the resolved tool calls. +6. Click an older Claude session in the sidebar that contains + shell + search tool calls. Verify the historical rows render + in the same terminal / search style as the live ones (replay + parity from Step 5). + +### Manual fallback + +If the launch skill cannot be used, the same scenario is +achievable by hand: open Agents window, send the same prompts, +inspect the chat UI, then `tail -f` the agentHost log under +`~/.config/Code - Insiders/logs/.../agentHost.log` (path varies +by OS; `code-oss-logs` resolves it). + +## Open Questions + +*(Pre-grilling. Will be resolved during the grill-with-docs +session and folded into Decisions above.)* + +- **Q1 — Replay completed-state inputs for past tense.** Replay's + `_attachToolResult` needs the parsed input that was set in + `_openToolUse`. The simplest path is a per-builder + `Map>`. Is that map's + lifetime correctly bounded by the builder, or does it need + explicit cleanup on unmatched orphans? (Owner: implementer, + resolve during Step 5.) +- **Q2 — Should `getClaudeShellLanguage` ever return non-`'bash'`?** + Today the only shell tools are `Bash` family. If/when the SDK + adds `Pwsh` etc., does `language` become a column on the row, or + derived? (Owner: roadmap, resolve when a non-bash shell tool + appears.) + +## References + +- Roadmap: [`./roadmap.md`](./roadmap.md) (Phase 8.5). +- Reference implementation: + [`copilotToolDisplay.ts`](../copilot/copilotToolDisplay.ts), + [`mapSessionEvents.ts`](../copilot/mapSessionEvents.ts). +- Phase 12 closeout (subagent `_meta` precedent): + [`phase12-plan.md`](./phase12-plan.md). +- Glossary: [`CONTEXT.md`](./CONTEXT.md). +- E2E skills used: `launch`, `code-oss-logs`. + +## Self-check + +- [x] Status set. +- [x] Every step has files, dependencies, and "done when". +- [x] Decisions captured. +- [x] Out of scope explicit. +- [x] Verification names concrete commands and skills. +- [x] E2E references real skills (`launch`, `code-oss-logs`). +- [x] An agent reading only this file can implement.