fix(skill): ban untracked deferrals in /review skill (#568)

carlos-alm · web-flow · commit f6c80acf50e0 · 2026-03-23T23:24:55.000-06:00
* feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches * fix(bench-check): capture stderr, guard division-by-zero, commit baseline - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule * fix(deps-audit): run npm ci after revert, document tokenizer skip reason - After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code * fix(housekeep): guard Phase 5 in source repo, fix stale-worktree criterion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>` * fix: address Round 3 Greptile review feedback * fix: move deps-audit stash to Phase 0, before npm commands modify manifests * fix: capture flaky-detection loop output to per-run files for comparison * fix: always require confirmation for stale worktree removal * fix: use parsed threshold in baseline.json, guard --compare-only on first run * fix(deps-audit): track stash creation to avoid operating on wrong entry When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists. * fix(test-health): use mktemp for flaky-run directory to avoid concurrent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis. * fix(bench-check): add save-baseline verdict path, fix em-dash, use explicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention. * docs(roadmap): update Phase 5 TypeScript migration with accurate progress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion. * fix: deps-audit success path should keep npm changes, not revert (#565) When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix. * fix: bench-check use git add + diff --cached to detect new files (#565) git diff --quiet ignores untracked files, so on the first run when baseline.json and history.ndjson are newly created, the commit was skipped. Stage first with git add, then check with --cached. * fix: housekeep require confirmation before branch deletion (#565) Branch deletion now asks for user confirmation before each delete, consistent with worktree removal in Phase 1c. * fix: scope git diff --cached to bench-check files only (#565) * fix: use json-summary reporter to match coverage-summary.json output (#565) * fix: capture stash ref by name to avoid position-based targeting (#565) * fix: remove unreachable Phase 5 subphases since source-repo guard always skips (#565) * fix: use dynamic threshold variable in bench-check Phase 6 report template (#565) * fix: address open review items in maintenance skills (#565) - bench-check: add timeout 300 wrappers to all 4 benchmark invocations with exit code 124 check for timeout detection - bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry - housekeep: fix grep portability — use grep -cE instead of GNU \| syntax - test-health: add timeout 180 wrapper in flaky detection loop - test-health: fix find command -o precedence with grouping parentheses * fix: add COVERAGE_ONLY guards to Phase 2 and Phase 4 in test-health * fix: add regression skip guard to bench-check Phase 5, expand deps-audit search dirs * fix: add empty-string guard for stat size check in housekeep (#565) When both stat variants (GNU and BSD) fail, $size is empty and the arithmetic comparison errors out. Add a [ -z "$size" ] && continue guard so the loop skips files whose size cannot be determined. * fix: add BASELINE SAVED verdict path and clarify if/else-if in bench-check (#565) Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a shortened report with "Verdict: BASELINE SAVED" instead of the full comparison report. Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with explicit "If timeout / Else if non-zero" so the two conditions are clearly mutually exclusive. * docs(roadmap): mark Phase 4 complete, update Phase 5 progress (5 of 7) Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5 merged via PRs #553, #554, #555, #566 but was listed with stale counts. Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7 with 76 of 283 modules migrated (~27%). * docs(roadmap): correct Phase 5 progress — 5.3/5.4/5.5 still in progress Previous commit incorrectly marked 5.3-5.5 as complete. In reality 76 of 283 src files are .ts (~27%) while 207 remain .js (~73%). PRs #553, #554, #555, #566 migrated a first wave but left substantial work in each step: 4 leaf files, 39 core files, 159 orchestration files. Updated each step with accurate migrated/remaining counts. * fix(skill): ban untracked deferrals in /review skill The /review skill allowed replying "acknowledged as follow-up" to reviewer comments without tracking them anywhere. These deferrals get lost — nobody revisits PR comment threads after merge. Now: if a fix is genuinely out of scope, the skill must create a GitHub issue with the follow-up label before replying. The reply must include the issue link. A matching rule in the Rules section reinforces the ban. * fix(skill): add --repo flag, multi-endpoint reply for deferrals, and scope guidance (#568) * fix(skill): guard follow-up label creation before gh issue create (#568) gh issue create --label "follow-up" fails if the label doesn't exist in the repo. Add a gh label create guard step that is safe to re-run. * feat(skill): parallelize /review with one subagent per PR * fix: correct heredoc terminator indentation in review skill (#568) * fix(skill): capture gh issue create output before referencing issue number gh issue create prints the new issue URL to stdout — capture it and extract the number so reply templates can reference it unambiguously. * fix(skill): surface follow-up issues in review result format and summary table Add "Issues Created" field to the subagent result format and an "Issues" column to the Step 3 summary table, so deferred out-of-scope items are visible in the final report. * fix(skill): require verbatim rule propagation to subagents * fix(skill): align Issues field name between 2i result and Step 3 table (#568)
diff --git a/.claude/skills/review/SKILL.md b/.claude/skills/review/SKILL.md
@@ -26,21 +26,30 @@ Record each PR's number, branch, base, merge status, and CI state.
 
 ---
 
-## Step 2: Process Each PR
+## Step 2: Launch Parallel Subagents
 
-For **each** open PR, perform the following steps in order. Process PRs one at a time to avoid cross-contamination.
+Each PR is independent work — **launch one Agent subagent per PR, all in parallel.** Use `isolation: "worktree"` so each agent gets its own copy of the repo with no cross-PR contamination.
 
-### 2a. Switch to the PR branch
+Pass each agent the full PR processing instructions (Steps 2a–2i below) along with the PR number, branch, base, and current state from Step 1. The agent prompt must include **all** the rules from the Rules section at the bottom of this skill — copy them **verbatim**, do not paraphrase or summarize.
 
-Ensure the working tree is clean before switching to avoid cross-PR contamination:
-
-```bash
-if [ -n "$(git status --porcelain)" ]; then
-  git stash push -m "pre-checkout stash"
-fi
 ```
+For each PR, launch an Agent with:
+- description: "Review PR #<number>"
+- isolation: "worktree"
+- prompt: <the full PR processing instructions below, with PR details filled in>
+```
+
+Launch **all** PR agents in a single message (one tool call per PR) so they run concurrently. Do NOT wait for one to finish before starting the next.
+
+Each agent will return a result summary. Collect all results for the final summary table in Step 3.
+
+---
+
+## PR Processing Instructions (for each subagent)
+
+The following steps are executed by each subagent for its assigned PR.
 
-Then check out the PR branch:
+### 2a. Check out the PR branch
 
 ```bash
 gh pr checkout <number>
@@ -120,27 +129,36 @@ For **each** review comment — including minor suggestions, nits, style feedbac
 2. **Read the relevant code** at the file and line referenced.
 3. **Make the change.** Even if the comment is marked as "nit" or "suggestion" or "minor" — address it. The goal is zero outstanding comments.
 4. **If you disagree** with a suggestion (e.g., it would introduce a bug or contradicts project conventions), do NOT silently ignore it. Reply to the comment explaining why you chose a different approach.
-5. **If the fix is genuinely out of scope** for this PR (e.g., it affects a different module not touched by this PR, or requires a design decision beyond the PR's purpose), you MUST create a GitHub issue to track it before replying. Never reply with "acknowledged as follow-up" or "noted for later" without a tracked issue — untracked deferrals get lost and nobody will ever revisit them.
+5. **If the fix is genuinely out of scope** for this PR, you MUST create a GitHub issue to track it before replying. Never reply with "acknowledged as follow-up" or "noted for later" without a tracked issue — untracked deferrals get lost and nobody will ever revisit them. "Genuinely out of scope" means the fix touches a different module not in the PR's diff, requires an architectural decision beyond the PR's mandate, or would introduce unrelated risk. Fixing a variable name, adding a null check, or adjusting a string in a file already in the diff is NOT out of scope — just do it.
 
    ```bash
-   # Create a tracking issue for the deferred item
-   gh issue create \
+   # Ensure the follow-up label exists (safe to re-run)
+   gh label create "follow-up" --color "0e8a16" --description "Deferred from PR review" --repo optave/codegraph 2>/dev/null || true
+
+   # Create a tracking issue for the deferred item and capture the issue number
+   issue_url=$(gh issue create \
+     --repo optave/codegraph \
      --title "follow-up: <concise description of what needs to be done>" \
-     --body "$(cat <<'EOF'
-   Deferred from PR #<number> review.
+     --body "$(cat <<-'EOF'
+	Deferred from PR #<number> review.
 
-   **Original reviewer comment:** https://github.com/optave/codegraph/pull/<number>#discussion_r<comment-id>
+	**Original reviewer comment:** <use the correct permalink format for the comment type: inline review comment → `https://github.com/optave/codegraph/pull/<number>#discussion_r<comment-id>`, top-level review body → `https://github.com/optave/codegraph/pull/<number>#pullrequestreview-<review-id>`, issue-style comment → `https://github.com/optave/codegraph/issues/<number>#issuecomment-<comment-id>`>
 
-   **Context:** <why this is out of scope for the current PR and what the fix entails>
-   EOF
+	**Context:** <why this is out of scope for the current PR and what the fix entails>
+	EOF
    )" \
-     --label "follow-up"
+     --label "follow-up")
+   issue_number=$(echo "$issue_url" | grep -oE '[0-9]+$')
    ```
 
-   Then reply to the reviewer comment referencing the issue:
+   Then reply to the reviewer comment referencing the issue (using `$issue_number` captured above). Use the same reply mechanism as step 6 below — inline PR review comments use `/pulls/<number>/comments/<comment-id>/replies`, top-level review bodies and issue-style comments use `/issues/<number>/comments`:
    ```bash
+   # For inline PR review comments:
    gh api repos/optave/codegraph/pulls/<number>/comments/<comment-id>/replies \
-     -f body="Out of scope for this PR — tracked in #<issue-number>"
+     -f body="Out of scope for this PR — tracked in #$issue_number"
+   # For top-level review bodies or issue-style comments:
+   gh api repos/optave/codegraph/issues/<number>/comments \
+     -f body="Out of scope for this PR — tracked in #$issue_number"
    ```
 6. **Reply to each comment** explaining what you did. The reply mechanism depends on where the comment lives:
 
@@ -183,7 +201,7 @@ After addressing all comments for a PR:
 
 ### 2g. Re-trigger reviewers
 
-**Greptile:** Before re-triggering, check if your last reply to Greptile already has a positive emoji reaction (👍, ✅, 🎉, etc.) from `greptileai`. A positive reaction means Greptile is satisfied with your fix — do NOT re-trigger in that case, move on. Only re-trigger if there is no positive reaction on your last comment:
+**Greptile:** Before re-triggering, check if your last reply to Greptile already has a positive emoji reaction (thumbs up, check, party, etc.) from `greptileai`. A positive reaction means Greptile is satisfied with your fix — do NOT re-trigger in that case, move on. Only re-trigger if there is no positive reaction on your last comment:
 
 ```bash
 # Check reactions on your most recent comment to see if Greptile already approved
@@ -211,21 +229,39 @@ After re-triggering:
 1. Wait for the new reviews to come in (check after a reasonable interval).
 2. Fetch new comments again (repeat Step 2d).
 3. If there are **new** comments from Greptile or Claude, go back to Step 2e and address them.
-4. **Repeat this loop for a maximum of 3 rounds.** If after 3 rounds there are still actionable comments, mark the PR as "needs human review" in the summary table and move to the next PR.
+4. **Repeat this loop for a maximum of 3 rounds.** If after 3 rounds there are still actionable comments, mark the PR as "needs human review" in the result.
 5. Verify CI is still green after all changes.
 
+### 2i. Return result
+
+At the end of processing, the subagent MUST return a structured result with these fields so the main agent can build the summary table:
+
+```
+PR: #<number>
+Branch: <branch-name>
+Conflicts: resolved | none
+CI: green | red | pending
+Comments Addressed: <count>
+Issues: <comma-separated list of #<n> follow-up issues, or "none">
+Reviewers Re-triggered: <list>
+Status: ready | needs-work | needs-human-review | skipped
+Notes: <any issues encountered>
+```
+
 ---
 
-## Step 3: Summary
+## Step 3: Collect Results and Summarize
 
-After processing all PRs, output a summary table:
+After **all** subagents complete, collect their results and output a summary table:
 
 ```
-| PR | Branch | Conflicts | CI | Comments Addressed | Reviewers Re-triggered | Status |
-|----|--------|-----------|----|--------------------|----------------------|--------|
-| #N | branch | resolved/none | green/red | N comments | greptile, claude | ready/needs-work |
+| PR | Branch | Conflicts | CI | Comments Addressed | Issues | Reviewers Re-triggered | Status |
+|----|--------|-----------|----|--------------------|--------|----------------------|--------|
+| #N | branch | resolved/none | green/red | N comments | #X, #Y or none | greptile, claude | ready/needs-work |
 ```
 
+If any subagent failed or returned an error, note it in the Status column as `agent-error` with the failure reason.
+
 ---
 
 ## Rules
@@ -234,7 +270,7 @@ After processing all PRs, output a summary table:
 - **Never force-push** unless fixing a commit message that fails commitlint. Amend + force-push is the only way to fix a pushed commit title (messages are part of the SHA). This is safe on feature branches. For all other problems, fix with a new commit.
 - **Address ALL comments from ALL reviewers** (Claude, Greptile, and humans), even minor/nit/optional ones. Leave zero unaddressed. Do not only respond to one reviewer and skip another.
 - **Always reply to comments** explaining what was done. Don't just fix silently. Every reviewer must see a reply on their feedback.
-- **Don't re-trigger Greptile if already approved.** If your last reply to a Greptile comment has a positive emoji reaction (👍, ✅, 🎉) from `greptileai`, it's already satisfied — skip re-triggering.
+- **Don't re-trigger Greptile if already approved.** If your last reply to a Greptile comment has a positive emoji reaction from `greptileai`, it's already satisfied — skip re-triggering.
 - **Only re-trigger Claude** if you addressed Claude's feedback specifically.
 - **No co-author lines** in commit messages.
 - **No Claude Code references** in commit messages or comments.