-
Notifications
You must be signed in to change notification settings - Fork 12
feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep #565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d52a3a4
b187fe1
a562b52
4fc994d
89aef6b
3e892d1
ce5d811
01b5110
3b0e293
52de495
8be5cec
0691ffc
87d9213
65d9836
eef2c03
19b14e9
5bda6ba
bd0ba1a
7b91e3c
7fcdd93
5462d32
457e6b9
852003d
eea2954
baf6797
9b4869c
8d92c99
9ad37ea
30ab30e
a8631d2
8fd7430
23f2f76
2616c78
cdcc086
ac1e0b5
5434550
316105c
c7115c2
740299f
a762236
ed12127
3d52aaa
912109f
a94ddf3
ad80d18
6d4de9f
34c50b9
8b636ae
8c467d3
ec1fd56
c9ac370
a2e8be0
7a1c86a
55c5a22
0dc2605
aa3e1f4
75350c7
7aab540
0b08a2b
e4f8c3d
933c0d3
005f806
2e6d37d
854f248
0db47cb
dcbe349
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -153,6 +153,13 @@ For each metric in the current run: | |||||||||
|
|
||||||||||
| ## Phase 4 — Verdict | ||||||||||
|
|
||||||||||
| ### Pre-condition check | ||||||||||
| Before evaluating verdicts, verify that at least one benchmark produced valid numeric results. | ||||||||||
| If `metrics` is empty (all suites produced `"error"` or `"timeout"` records): | ||||||||||
| - Print: `BENCH-CHECK ABORTED — no valid benchmark results (all suites failed or timed out)` | ||||||||||
| - Do NOT proceed to Phase 5 — the baseline must not be overwritten with empty data | ||||||||||
| - Stop here and skip to Phase 6 to write the report with verdict `ABORTED`. | ||||||||||
|
|
||||||||||
| Based on comparison results: | ||||||||||
|
|
||||||||||
| ### No regressions found | ||||||||||
|
|
@@ -169,7 +176,7 @@ Based on comparison results: | |||||||||
| - Re-run individual benchmarks to confirm (not flaky) | ||||||||||
|
|
||||||||||
| ### First run (no baseline) | ||||||||||
| - If `COMPARE_ONLY` is set: print a warning that no baseline exists and exit without saving | ||||||||||
| - If `COMPARE_ONLY` is set: print a warning that no baseline exists and **stop here — do not proceed to Phase 5 or Phase 6**. No baseline is saved and no report is written. | ||||||||||
| - Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baseline | ||||||||||
|
Comment on lines
+178
to
+180
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Phase 4's first-run path says:
The word "exit" here is ambiguous — an agent may interpret it as "exit this verdict path and continue to Phase 5/6/7" rather than "terminate the entire skill." If Phase 6 is reached in this scenario, its branch condition reads:
A Disambiguate Phase 4 by making the early exit explicit: ### First run (no baseline)
- If `COMPARE_ONLY` is set: print a warning (`BENCH-CHECK: no baseline exists — nothing to compare against`) and **exit the skill without writing a report or committing anything**.
- Otherwise: print `BENCH-CHECK — initial baseline saved` and proceed to Phase 5.And add a corresponding guard to Phase 6 so the "BASELINE SAVED" verdict path is explicitly excluded for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — added explicit 'stop here' early-exit wording in Phase 4's first-run path and a guard in Phase 6 that skips the BASELINE SAVED report when --compare-only was set with no baseline. |
||||||||||
|
|
||||||||||
| ### Save-baseline with existing baseline (`--save-baseline`) | ||||||||||
|
|
@@ -180,6 +187,7 @@ Based on comparison results: | |||||||||
|
|
||||||||||
| **Skip this phase if `COMPARE_ONLY` is set.** Compare-only mode never writes or commits baselines. | ||||||||||
| **Skip this phase if regressions were detected in Phase 4.** The baseline is only updated on a clean run. | ||||||||||
| **Skip this phase if the ABORTED pre-condition was triggered in Phase 4.** The baseline must not be overwritten with empty data. | ||||||||||
|
|
||||||||||
| When saving (initial run, `--save-baseline`, or passed comparison): | ||||||||||
|
|
||||||||||
|
|
@@ -211,8 +219,26 @@ git diff --cached --quiet -- generated/bench-check/baseline.json generated/bench | |||||||||
|
|
||||||||||
| ## Phase 6 — Report | ||||||||||
|
|
||||||||||
| **Skip this phase (write no report) if `COMPARE_ONLY` was set and no baseline existed, AND the ABORTED pre-condition was not triggered.** That early-exit case was already handled in Phase 4 — writing a "BASELINE SAVED" report here would be misleading since no baseline was saved. When ABORTED, always write the ABORTED report regardless of other flags. | ||||||||||
|
|
||||||||||
| Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`. | ||||||||||
|
|
||||||||||
| **If the ABORTED pre-condition was triggered (no valid benchmark results):** write a minimal report — this check MUST come before the SAVE_ONLY/first-run check, because when all benchmarks fail on a `--save-baseline` or first run, SAVE_ONLY would also be true but no baseline was actually saved: | ||||||||||
|
|
||||||||||
| ```markdown | ||||||||||
| # Benchmark Report — <date> | ||||||||||
|
|
||||||||||
| **Version:** X.Y.Z | **Git ref:** abc1234 | **Threshold:** $THRESHOLD% | ||||||||||
|
|
||||||||||
| ## Verdict: ABORTED — no valid benchmark results | ||||||||||
|
|
||||||||||
| All benchmark suites failed or timed out. See Phase 1 error records for details. | ||||||||||
|
|
||||||||||
| ## Raw Results | ||||||||||
|
|
||||||||||
| <!-- Error/timeout records from each suite --> | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| **If `SAVE_ONLY` is set or no prior baseline existed (first run):** write a shortened report — omit the "Comparison vs Baseline" and "Regressions" sections since no comparison was performed: | ||||||||||
|
|
||||||||||
| ```markdown | ||||||||||
|
|
@@ -257,7 +283,7 @@ Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`. | |||||||||
|
|
||||||||||
| 1. If report was written, print its path | ||||||||||
| 2. If baseline was updated, print confirmation | ||||||||||
| 3. Print one-line summary: `PASSED (0 regressions) | FAILED (N regressions) | BASELINE SAVED` | ||||||||||
| 3. Print one-line summary: `PASSED (0 regressions) | FAILED (N regressions) | BASELINE SAVED | ABORTED (all suites failed)` | ||||||||||
|
|
||||||||||
| ## Rules | ||||||||||
|
|
||||||||||
|
|
@@ -266,6 +292,6 @@ Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`. | |||||||||
| - **Don't update baseline on regression** — the user must investigate first | ||||||||||
| - **Recall/quality metrics are inverted** — a decrease is a regression | ||||||||||
| - **Count metrics are informational** — graph growing isn't a regression | ||||||||||
| - **The baseline file is committed to git** — it's a shared reference point; Phase 5 always commits it | ||||||||||
| - **The baseline file is committed to git** — it's a shared reference point; Phase 5 commits it on clean (non-regressed) runs where COMPARE_ONLY is not set | ||||||||||
| - **history.ndjson is append-only** — never truncate or rewrite it | ||||||||||
|
Comment on lines
+294
to
+296
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
After the multiple rounds of fixes, Phase 5 now has two explicit skip guards:
The rule at line 269 still reads: "The baseline file is committed to git — it's a shared reference point; Phase 5 always commits it." This is factually incorrect and could mislead a future editor into removing the skip guards, believing they contradict the rule.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — updated the rule to: 'Phase 5 commits it on clean (non-regressed) runs where COMPARE_ONLY is not set.' |
||||||||||
| - Generated files go in `generated/bench-check/` — create the directory if needed | ||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,14 +22,10 @@ Audit the project's dependency tree for security vulnerabilities, outdated packa | |
| 5. **If `AUTO_FIX` is set:** Save the original manifests now, before any npm commands run, so pre-existing unstaged changes are preserved: | ||
| ```bash | ||
| git stash push -m "deps-audit-backup" -- package.json package-lock.json | ||
| STASH_CREATED=$? | ||
| ``` | ||
| Track `STASH_CREATED` — when `0`, a stash entry was actually created; when `1`, the files had no changes so nothing was stashed. | ||
| If `STASH_CREATED` is `0`, immediately capture the stash ref for later use: | ||
| ```bash | ||
| STASH_REF=$(git stash list --format='%gd %s' | grep 'deps-audit-backup' | head -1 | awk '{print $1}') | ||
| ``` | ||
| Use `$STASH_REF` (not `stash@{0}`) in all later stash drop/pop commands to avoid targeting the wrong entry if other stashes are pushed in the interim. | ||
| `STASH_REF` is non-empty if and only if a stash entry was actually created. Do **not** use `$?` — modern git (2.16+) returns 0 even when nothing was stashed. | ||
| Use `[ -n "$STASH_REF" ]` (stash created) / `[ -z "$STASH_REF" ]` (nothing stashed) for all branching. Use `$STASH_REF` (not `stash@{0}`) in all later stash drop/pop commands to avoid targeting the wrong entry. | ||
|
|
||
| ## Phase 1 — Security Vulnerabilities | ||
|
|
||
|
|
@@ -165,13 +161,25 @@ If `AUTO_FIX` was set: | |
| Summarize all changes made: | ||
| 1. List each package updated/fixed | ||
| 2. Run `npm test` to verify nothing broke | ||
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop $STASH_REF`) — the npm changes are good, no rollback needed | ||
| If tests pass and `STASH_CREATED` is `1`: no action needed — the npm changes are good and no stash entry exists to clean up | ||
| 4. If tests fail and `STASH_CREATED` is `0`: | ||
| - Restore the saved manifests: `git stash pop $STASH_REF` | ||
| 3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`) — this restores any pre-existing uncommitted changes alongside the npm fix results. Note: the step 2 test run validated the npm changes alone; step 3b below is the authoritative test of the final merged state. | ||
| - If the pop applies cleanly: | ||
| a. Run `npm install` to re-sync `node_modules/` with the merged manifest. | ||
| b. Re-run `npm test` to confirm the merged state is consistent (this is the authoritative check — step 2 only validated the npm changes in isolation). | ||
| c. If tests still pass: confirm the project is consistent. | ||
| d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes. | ||
|
Comment on lines
+165
to
+169
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When the stash pops cleanly and
Without a recovery path, the user is left with a mixed, broken state and must manually reconstruct which changes to keep. Add explicit recovery guidance: d. If tests now fail: warn the user — the pre-existing manifest changes conflict with the audit fixes.
Recovery options:
- To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci`
- To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci`
- To keep only the pre-existing changes and discard the audit fixes: re-run `/deps-audit` without `--fix`
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — step 169d now lists three explicit recovery options: undo all changes (git checkout + npm ci), keep only audit fixes (manual edit + npm ci), or keep only pre-existing changes (re-run without --fix). |
||
| Recovery options: | ||
| - To undo **all** manifest changes (both audit fixes and pre-existing): `git checkout -- package.json package-lock.json && npm ci` | ||
| - To keep only the audit fixes and discard pre-existing changes: manually edit `package.json`/`package-lock.json` to remove the pre-existing delta, then `npm ci` | ||
| - To keep only the pre-existing changes and discard the audit fixes: `git checkout HEAD -- package.json package-lock.json && npm ci` to revert manifests to their clean state, then manually re-apply only your pre-existing changes | ||
| - If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing." | ||
| - For conflicts in other files, resolve them by keeping both the npm fixes and the pre-existing changes. | ||
| If tests pass and `STASH_REF` is empty: no action needed — the npm changes are good and no stash entry exists to clean up | ||
|
Comment on lines
163
to
+176
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When
There is no equivalent instruction for the clean-pop path. If the stashed pre-existing changes include a dependency addition or version pin (e.g., the user was mid-way through adding a new package), the merged Suggest adding a re-sync step after a clean pop: 3. If tests pass and `STASH_REF` is non-empty: pop and merge the saved state (`git stash pop $STASH_REF`).
- If the pop applies cleanly: run `npm install` to re-sync `node_modules/` with the merged manifest, then confirm the project is consistent.
- If the pop causes conflicts in `package.json`/`package-lock.json`: warn the user, leave conflict markers for manual resolution, and instruct: "After you resolve the conflicts, run `npm install` to re-sync `node_modules/` with the resolved lock file before committing."The
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — the clean-pop success path now runs \Unknown command: install"
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — the clean-pop success path now runs npm install to re-sync node_modules with the merged manifest before confirming consistency. This is a no-op if the pre-existing changes did not affect installed packages. |
||
| 4. If tests fail and `STASH_REF` is non-empty: | ||
| - Reset manifests to HEAD first (undoes npm changes): `git checkout HEAD -- package.json package-lock.json` | ||
| - Then re-apply the pre-existing changes cleanly: `git stash pop $STASH_REF` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed | ||
|
Comment on lines
+177
to
181
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When tests fail and
By this point in step 4, the working tree contains:
The correct two-step restore is:
At step 2 the working tree matches HEAD, so the stash applies exactly as it was originally created — no conflicts. Suggested replacement for step 4: 4. If tests fail and `STASH_REF` is non-empty:
- Reset manifests to HEAD first (undoes npm changes):
`git checkout HEAD -- package.json package-lock.json`
- Then re-apply the pre-existing changes cleanly:
`git stash pop $STASH_REF`
- Restore `node_modules/` to match the reverted lock file: `npm ci`
- Report what failedNote that the success path (step 3) intentionally does a merge (
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — the failure path now resets manifests to HEAD first (
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed in prior commit 0b08a2b — the failure path already resets manifests to HEAD first ( |
||
| 5. If tests fail and `STASH_CREATED` is `1`: | ||
| 5. If tests fail and `STASH_REF` is empty: | ||
| - Discard manifest changes: `git checkout -- package.json package-lock.json` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed | ||
|
|
@@ -181,6 +189,6 @@ Summarize all changes made: | |
| - **Never run `npm audit fix --force`** — breaking changes need human review | ||
| - **Never remove a dependency** without asking the user, even if it appears unused — flag it in the report instead | ||
| - **Always run tests** after any auto-fix changes | ||
| - **If `--fix` causes test failures**, restore manifests from the saved state (`git stash pop $STASH_REF` if `STASH_CREATED=0`, or `git checkout` if stash was a no-op) then run `npm ci` to resync `node_modules/`, and report the failure | ||
| - **If `--fix` causes test failures**, first reset manifests to HEAD (`git checkout HEAD -- package.json package-lock.json`), then re-apply pre-existing changes (`git stash pop $STASH_REF` if `STASH_REF` is non-empty, or no-op if nothing was stashed), then run `npm ci` to resync `node_modules/`, and report the failure | ||
| - Treat `optionalDependencies` separately — they're expected to fail on some platforms | ||
| - The report goes in `generated/deps-audit/` — create the directory if it doesn't exist | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -66,28 +66,65 @@ For stale worktrees with merged branches: | |||||||||||||||||
|
|
||||||||||||||||||
| ## Phase 2 — Delete Dirt Files | ||||||||||||||||||
|
|
||||||||||||||||||
| Remove temporary and generated files that accumulate over time: | ||||||||||||||||||
| Remove temporary and generated files that accumulate over time. There are two distinct categories of dirt that require different discovery commands: | ||||||||||||||||||
|
|
||||||||||||||||||
| - **Gitignored dirt** (files matching `.gitignore` patterns — e.g. `coverage/`, `.DS_Store`, `*.log`, `.codegraph/graph.db-journal`): use `git clean -fdX --dry-run` to list them. `git ls-files --others --exclude-standard` silently omits these because `--exclude-standard` suppresses gitignored entries. | ||||||||||||||||||
| - **Untracked non-ignored files** (stray files not in `.gitignore` — e.g. `*.tmp.*`, `*.bak`, `*.orig`): use `git ls-files --others --exclude-standard` to list them. | ||||||||||||||||||
|
|
||||||||||||||||||
| Run both commands and union the results to get the full set of candidate dirt files. | ||||||||||||||||||
|
|
||||||||||||||||||
| ### 2a. Known dirt patterns | ||||||||||||||||||
|
|
||||||||||||||||||
| Search for and remove: | ||||||||||||||||||
| Search for and remove files found by the two discovery commands above (never touch tracked files): | ||||||||||||||||||
| - `*.tmp.*`, `*.bak`, `*.orig` files in the repo (but NOT in `node_modules/`) | ||||||||||||||||||
| - `.DS_Store` files | ||||||||||||||||||
| - `*.log` files in repo root (not in `node_modules/`) | ||||||||||||||||||
| - Empty directories (except `.codegraph/`, `.claude/`, `node_modules/`) | ||||||||||||||||||
| - `coverage/` directory (regenerated by `npm run test:coverage`) | ||||||||||||||||||
| - `.codegraph/graph.db-journal` (SQLite WAL leftovers) | ||||||||||||||||||
| - Stale lock files: `.codegraph/*.lock` older than 1 hour | ||||||||||||||||||
|
|
||||||||||||||||||
| **Stale lock files** (`.codegraph/*.lock` older than 1 hour): Before removing, first check if `lsof` is available (`command -v lsof`). If `lsof` is **not installed** (common in Docker/CI minimal containers where it exits 127), **skip lock file removal entirely** and print a warning: `"lsof not available — skipping lock file cleanup (cannot verify no process holds the file)"`. When `lsof` IS available, use `lsof "$f"` to verify no process holds the file. If the file is held, **skip it** and warn — concurrent Claude Code sessions may hold legitimate long-lived locks. | ||||||||||||||||||
|
|
||||||||||||||||||
| ```bash | ||||||||||||||||||
| if ! command -v lsof > /dev/null 2>&1; then | ||||||||||||||||||
| echo "lsof not available — skipping lock file cleanup (cannot verify no process holds the file)" | ||||||||||||||||||
| else | ||||||||||||||||||
| for f in .codegraph/*.lock; do | ||||||||||||||||||
| [ -f "$f" ] || continue | ||||||||||||||||||
| age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) )) | ||||||||||||||||||
| [ -z "$age" ] && continue | ||||||||||||||||||
| if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then | ||||||||||||||||||
| if [ "$DRY_RUN" = "true" ]; then | ||||||||||||||||||
| echo "[DRY RUN] Would remove stale lock: $f" | ||||||||||||||||||
| else | ||||||||||||||||||
| echo "Removing stale lock: $f" | ||||||||||||||||||
| rm "$f" | ||||||||||||||||||
| fi | ||||||||||||||||||
| elif [ "$age" -gt 3600 ]; then | ||||||||||||||||||
| echo "Lock file $f is old but still held by a process — ask user before removing" | ||||||||||||||||||
| fi | ||||||||||||||||||
| done | ||||||||||||||||||
| fi | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
Comment on lines
+88
to
+108
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The provided code block in Phase 2a runs An agent following the skill sequentially will execute Phase 2a's code block (including Add a for f in .codegraph/*.lock; do
[ -f "$f" ] || continue
age=$(( $(date +%s) - $(stat --format='%Y' "$f" 2>/dev/null || stat -f '%m' "$f" 2>/dev/null) ))
[ -z "$age" ] && continue
if [ "$age" -gt 3600 ] && ! lsof "$f" > /dev/null 2>&1; then
if [ "$DRY_RUN" = "true" ]; then
echo "[DRY RUN] Would remove stale lock: $f"
else
echo "Removing stale lock: $f"
rm "$f"
fi
elif [ "$age" -gt 3600 ]; then
echo "Lock file $f is old but still held by a process — ask user before removing"
fi
done
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — added a DRY_RUN guard inside the lock file removal loop. When DRY_RUN=true, it prints what would be removed instead of calling rm. |
||||||||||||||||||
|
|
||||||||||||||||||
| ### 2b. Large untracked files | ||||||||||||||||||
|
|
||||||||||||||||||
| Find untracked files larger than 1MB: | ||||||||||||||||||
| Find untracked files (both gitignored and non-ignored) larger than 1MB. Use both discovery commands and union the paths: | ||||||||||||||||||
| ```bash | ||||||||||||||||||
| # Non-ignored untracked files | ||||||||||||||||||
| git ls-files --others --exclude-standard | while read f; do | ||||||||||||||||||
| size=$(stat --format='%s' "$f" 2>/dev/null || stat -f '%z' "$f" 2>/dev/null) | ||||||||||||||||||
| [ -z "$size" ] && continue | ||||||||||||||||||
| if [ "$size" -gt 1048576 ]; then echo "$f ($size bytes)"; fi | ||||||||||||||||||
| done | ||||||||||||||||||
| # Gitignored files (strip the leading "Would remove " prefix from dry-run output) | ||||||||||||||||||
| git clean -fdX --dry-run | sed 's/^Would remove //' | while read f; do | ||||||||||||||||||
| # Skip directory entries — stat returns inode size, not content size | ||||||||||||||||||
| [ -d "$f" ] && continue | ||||||||||||||||||
| size=$(stat --format='%s' "$f" 2>/dev/null || stat -f '%z' "$f" 2>/dev/null) | ||||||||||||||||||
| [ -z "$size" ] && continue | ||||||||||||||||||
| if [ "$size" -gt 1048576 ]; then echo "$f ($size bytes) [gitignored]"; fi | ||||||||||||||||||
| done | ||||||||||||||||||
|
Comment on lines
+121
to
+127
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Replace the raw # Gitignored files (strip the leading "Would remove " prefix from dry-run output)
git clean -fdX --dry-run | sed 's/^Would remove //' | while read f; do
# Skip directory entries — their size can't be reliably measured with stat
[ -d "$f" ] && continue
size=$(stat --format='%s' "$f" 2>/dev/null || stat -f '%z' "$f" 2>/dev/null)
[ -z "$size" ] && continue
if [ "$size" -gt 1048576 ]; then echo "$f ($size bytes) [gitignored]"; fi
doneAlternatively, if flagging large gitignored directories matters (e.g. a 500 MB accidentally-untracked dataset), use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — added |
||||||||||||||||||
| ``` | ||||||||||||||||||
|
Comment on lines
+115
to
+128
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
size=$(stat --format='%s' "$f" 2>/dev/null || stat -f '%z' "$f" 2>/dev/null)
if [ "$size" -gt 1048576 ]; then echo "$f ($size bytes)"; fiIf both This terminates the entire Add an empty-string guard before the comparison:
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — added \ guard before the arithmetic comparison. If both stat variants (GNU \ and BSD ) fail, the loop now skips the file instead of erroring on the empty-string comparison. |
||||||||||||||||||
|
|
||||||||||||||||||
| Flag these for user review — they might be accidentally untracked binaries. | ||||||||||||||||||
|
|
@@ -117,7 +154,7 @@ git log HEAD..origin/main --oneline | |||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| If main has new commits: | ||||||||||||||||||
| - If on main: `git pull origin main` | ||||||||||||||||||
| - If on main: `git pull --no-rebase origin main` | ||||||||||||||||||
| - If on a feature branch: inform the user how many commits behind main they are | ||||||||||||||||||
| - Suggest: `git merge origin/main` (never rebase — per project rules) | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -41,8 +41,14 @@ Run the full test suite `FLAKY_RUNS` times and track per-test pass/fail: | |||||||||||||||||||||||
| RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX) | ||||||||||||||||||||||||
| for i in $(seq 1 $FLAKY_RUNS); do | ||||||||||||||||||||||||
| timeout 180 npx vitest run --reporter=json > "$RUN_DIR/run-$i.json" 2>"$RUN_DIR/run-$i.err" | ||||||||||||||||||||||||
| if [ $? -eq 124 ]; then | ||||||||||||||||||||||||
| exit_code=$? | ||||||||||||||||||||||||
| if [ $exit_code -eq 124 ]; then | ||||||||||||||||||||||||
| echo '{"timeout":true}' > "$RUN_DIR/run-$i.json" | ||||||||||||||||||||||||
| elif [ $exit_code -ne 0 ] && [ $exit_code -ne 1 ]; then | ||||||||||||||||||||||||
| # Use jq to safely JSON-escape stderr content (may contain quotes, newlines, backslashes) | ||||||||||||||||||||||||
| stderr_content=$(cat "$RUN_DIR/run-$i.err") | ||||||||||||||||||||||||
| jq -n --argjson code "$exit_code" --arg stderr "$stderr_content" \ | ||||||||||||||||||||||||
| '{"error":true,"exit_code":$code,"stderr":$stderr}' > "$RUN_DIR/run-$i.json" | ||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||
|
Comment on lines
+43
to
+52
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The loop only guards for exit code Add an explicit fallback for other non-zero exit codes: timeout 180 npx vitest run --reporter=json > "$RUN_DIR/run-$i.json" 2>"$RUN_DIR/run-$i.err"
run_exit=$?
if [ $run_exit -eq 124 ]; then
echo '{"timeout":true}' > "$RUN_DIR/run-$i.json"
elif [ $run_exit -ne 0 ] && [ $run_exit -ne 1 ]; then
# Exit 1 = test failures (valid JSON written); other codes = vitest crash
echo '{"error":true,"code":'$run_exit'}' > "$RUN_DIR/run-$i.json"
fi(vitest exits
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — the flaky-detection loop now captures the exit code and handles three cases: 124 (timeout marker), 0 or 1 (normal vitest pass/fail), and anything else (error marker with exit code and stderr). This prevents corrupt/empty run files from misclassifying crashing tests as flaky.
Comment on lines
+47
to
+52
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The Replace with a properly escaped construction using elif [ $exit_code -ne 0 ] && [ $exit_code -ne 1 ]; then
# Exit 1 = test failures (valid JSON written); other codes = vitest crash
stderr_content=$(cat "$RUN_DIR/run-$i.err")
jq -n --argjson code "$exit_code" --arg stderr "$stderr_content" \
'{"error":true,"exit_code":$code,"stderr":$stderr}' > "$RUN_DIR/run-$i.json"
fi
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — replaced raw echo with jq -n --arg for proper JSON escaping of stderr content. Quotes, backslashes, and newlines are now safely handled. |
||||||||||||||||||||||||
| done | ||||||||||||||||||||||||
|
Comment on lines
+41
to
+53
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Line 66 states: "Each full suite run gets 3 minutes. If it times out, record partial results and continue." However, the loop invocation has no for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json > "$RUN_DIR/run-$i.json" 2>"$RUN_DIR/run-$i.err"
doneIf any vitest run hangs (e.g., a test opens a port and never closes it), the entire Phase 1 blocks indefinitely. With the default This is exactly the same enforcement gap that was identified for bench-check's benchmark invocations. Apply the same fix here:
Suggested change
Exit code
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — flaky detection loop now uses |
||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
@@ -56,7 +62,9 @@ rm -rf "$RUN_DIR" | |||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ### Analysis | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| A test is **flaky** if it passes in some runs and fails in others. | ||||||||||||||||||||||||
| Before analyzing, **exclude invalid runs**: skip any run file containing `{"timeout":true}` or `{"error":true}` — these runs produced no reliable per-test data and must not be counted as "all tests failed." Require a **minimum of 2 valid runs** (runs with parseable vitest JSON output) for flaky detection to be conclusive. If fewer than 2 valid runs remain, report that flaky detection was inconclusive due to too many errored/timed-out runs. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| A test is **flaky** if it passes in some valid runs and fails in others. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| For each flaky test found: | ||||||||||||||||||||||||
| 1. Record: test file, test name, pass count, fail count, failure messages | ||||||||||||||||||||||||
|
|
@@ -139,7 +147,7 @@ Find files with < 50% line coverage. For each: | |||||||||||||||||||||||
| Compare against `main` branch to find recently changed files: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| git diff --name-only main...HEAD -- src/ | ||||||||||||||||||||||||
| git diff --name-only origin/main...HEAD -- src/ | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| For each changed source file, check if: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pre-condition check in Phase 4 correctly says "Stop here and skip to Phase 6 to write the report with verdict
ABORTED." However, Phase 6 only defines two report branches:SAVE_ONLYor first-run → shortened "BASELINE SAVED" reportAn agent following an ABORTED verdict would fall through to the "Otherwise" branch (there was a prior baseline, no SAVE_ONLY) and produce a report using the "PASSED / FAILED" template, complete with empty "Comparison vs Baseline" and "Regressions" sections. This is misleading — no comparison was performed because all benchmarks failed.
Similarly, Phase 7 line 269's summary only lists
PASSED (0 regressions) | FAILED (N regressions) | BASELINE SAVED, with noABORTEDoption.Add a dedicated ABORTED branch to Phase 6, for example:
And update Phase 7's summary line to include
| ABORTED (all suites failed).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed — added a dedicated ABORTED branch to Phase 6 with a minimal report template (version, git ref, verdict, raw error records). Also added ABORTED to the Phase 7 summary line.