fix(pr-review): eliminate stale legacy worktree + signal HEAD to resumed judges#176
Conversation
…med judges Symptom: cube panel kept posting findings against pre-rewrite code on aetheronhq/aetheron-connect-v2#1266 even after PR #175's --force fetch landed. Same 10+ findings against lines/tables that no longer exist. Two more layers of the bug surfaced: (1) Duplicate worktrees, only one synced. PR reviews historically used `~/.cube/worktrees/<project>/pr-<n>/` (legacy, created by `_prefetch_worktrees`). PR #172 added a second one at `pr-review-<n>/` for the new file-routing flow. Both worktrees existed in parallel; only the new one had the --force fetch + HEAD verification from #175. The other stayed pinned to whatever it was last synced to. Worse: `build_peer_review_prompt` told judges to Read from the LEGACY path, while `_apply_file_scope` told them to Read from the NEW path. Same prompt, two locations. Confirmed live: both worktrees existed on PR #1266; the legacy one was stuck at fda636a3 (6 commits behind 62e9fbd1). (2) Resumed judges had stale session memory. Sessions persist across panel runs. On the next run, the LLM continues its prior conversation — including its prior reasoning about file contents, line numbers, function shapes. Even when the resumed judge Reads fresh files, its reasoning anchors on the old memory. The previous "RE-REVIEW" hint was too soft to displace it. Fixes: * Skip `_prefetch_worktrees` in PR-review mode when a dedicated `review_worktree` was passed. The new sync is authoritative; the legacy sync just made a second stale copy. * Pipe `review_worktree` through `build_peer_review_prompt` so the prompt's worktree path matches the file-scope block. One location. * Replace the meek "verify commit" hint with a hard "CURRENT HEAD" callout that names the SHA, says "your prior reasoning may be out of date", and lists 4 explicit steps: Re-Read fresh, verify each prior finding still applies, add new ones, drop the rest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details🔇 Additional comments (1)
WalkthroughThe PR updates judge-panel PR-review logic to use a caller-provided dedicated review worktree instead of relying on potentially stale synced duplicates. It derives the PR HEAD SHA from that worktree directly, standardises the per-PR worktree path, and routes the path into peer-review prompts which instruct judges to re-read all code at the current HEAD before reviewing. ChangesDedicated review worktree integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
python/cube/automation/judge_panel.py (1)
518-518: ⚡ Quick winMove
subprocessimport to module level.Importing inside a conditional block is non-idiomatic Python. Module-level imports improve readability and make dependencies easier to track.
♻️ Proposed refactor
At the top of the file, after the existing imports (around line 3):
import asyncio +import subprocess from datetime import datetimeThen remove line 518:
if is_pr_review_with_dedicated_worktree: # Compute SHA from the worktree the caller synced. - import subprocess sha_result = subprocess.run(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cube/automation/judge_panel.py` at line 518, Move the local "import subprocess" out of the conditional and add it to the module-level imports in python/cube/automation/judge_panel.py alongside the other top imports, then delete the in-function/conditional import site (the inline "import subprocess" currently present) so the module consistently uses the top-level subprocess symbol; run linting/tests to ensure no circular import or side-effect issues after the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cube/automation/judge_panel.py`:
- Around line 520-527: The code silently sets worktree_head_sha to None when the
git call fails; change this to fail-fast or at least emit a clear warning: after
subprocess.run(...) check sha_result.returncode and if non-zero either raise an
exception (so the caller of the function in judge_panel.py fails fast) or call
the module logger to log an explicit error/warning including sha_result.stderr
and the review_worktree path, instead of continuing with worktree_head_sha=None;
update any callers that expect a string accordingly so the HEAD verification
block (which uses worktree_head_sha) always receives a valid SHA or the flow is
aborted with an explanatory error.
---
Nitpick comments:
In `@python/cube/automation/judge_panel.py`:
- Line 518: Move the local "import subprocess" out of the conditional and add it
to the module-level imports in python/cube/automation/judge_panel.py alongside
the other top imports, then delete the in-function/conditional import site (the
inline "import subprocess" currently present) so the module consistently uses
the top-level subprocess symbol; run linting/tests to ensure no circular import
or side-effect issues after the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7e2077f8-bd70-4b16-a06e-1bf46588afd4
📒 Files selected for processing (2)
python/cube/automation/judge_panel.pypython/cube/automation/prompts.py
📜 Review details
🔇 Additional comments (5)
python/cube/automation/prompts.py (4)
4-4: LGTM!
386-397: LGTM!
398-410: LGTM!
413-433: LGTM!python/cube/automation/judge_panel.py (1)
601-608: LGTM!
| sha_result = subprocess.run( | ||
| ["git", "rev-parse", "HEAD"], | ||
| cwd=review_worktree, | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=10, | ||
| ) | ||
| worktree_head_sha = sha_result.stdout.strip() if sha_result.returncode == 0 else None |
There was a problem hiding this comment.
Silent failure defeats the HEAD verification fix.
If git rev-parse HEAD fails (e.g., corrupted worktree, not a git repository), worktree_head_sha is set to None, converted to "" at line 606, and the entire verification block is skipped in the prompt. Judges don't receive the explicit "CURRENT HEAD" instructions to re-read all files, allowing stale session memory to persist — one of the two bugs this PR aims to fix.
Raise an error when SHA extraction fails, or at minimum log a warning.
🛡️ Proposed fix to fail fast when SHA extraction fails
sha_result = subprocess.run(
["git", "rev-parse", "HEAD"],
cwd=review_worktree,
capture_output=True,
text=True,
timeout=10,
)
- worktree_head_sha = sha_result.stdout.strip() if sha_result.returncode == 0 else None
+ if sha_result.returncode != 0:
+ raise RuntimeError(
+ f"Failed to extract HEAD SHA from review worktree {review_worktree}. "
+ f"Cannot verify code freshness. stderr: {sha_result.stderr}"
+ )
+ worktree_head_sha = sha_result.stdout.strip()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cube/automation/judge_panel.py` around lines 520 - 527, The code
silently sets worktree_head_sha to None when the git call fails; change this to
fail-fast or at least emit a clear warning: after subprocess.run(...) check
sha_result.returncode and if non-zero either raise an exception (so the caller
of the function in judge_panel.py fails fast) or call the module logger to log
an explicit error/warning including sha_result.stderr and the review_worktree
path, instead of continuing with worktree_head_sha=None; update any callers that
expect a string accordingly so the HEAD verification block (which uses
worktree_head_sha) always receives a valid SHA or the flow is aborted with an
explanatory error.
…<n>/ PR #172 invented `pr-review-<n>/` as a parallel worktree to hold the file-routing flow's check-out. The original `pr-<n>/` (managed by `_prefetch_worktrees` for the cube panel + `_get_cli_review_worktrees` for CodeRabbit) kept existing alongside it. Result: two worktrees per PR, only one kept synced by the new code, prompts pointing at the wrong one, judges Reading stale content. Reverting the path so the file-routing flow reuses `pr-<n>/`. Combined with the #176 skip-`_prefetch_worktrees`-when-already-synced logic, this gives one worktree per PR with the new --force fetch + HEAD verification applied to it. CodeRabbit and cube judges both read from the same place. `pr-review-<n>/` orphan dirs deleted from the local cache; `git worktree prune` cleaned the stale refs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Symptom
Even after #175 landed (--force fetch), cube kept posting the same 10+ findings against pre-rewrite code on aetheronhq/aetheron-connect-v2#1266. Roy's diagnosis: cube is anchored on diff line numbers from an earlier commit.
Two more bugs surfaced
1. Duplicate worktrees, only one synced
PR reviews historically used `~/.cube/worktrees//pr-/` (legacy, created by `_prefetch_worktrees`). PR #172 added a second one at `pr-review-/` for the new file-routing flow. Both existed in parallel; only the new one had #175's --force + HEAD verification.
Worse: `build_peer_review_prompt` told judges to Read from the LEGACY path while `_apply_file_scope` pointed them at the NEW path. Same prompt, two locations, judges sometimes picked the stale one.
Confirmed live: both worktrees existed on PR #1266; the legacy one was at fda636a3 (6 commits behind the actual head 62e9fbd1).
2. Resumed judges had stale session memory
Sessions persist across panel runs (token-efficient). On the next run, the LLM continues its prior conversation including its prior reasoning about file contents and line numbers. Even when the resumed judge Reads fresh files, its reasoning anchors on the old memory. The previous "RE-REVIEW" directive was too soft to displace stale memory.
Fixes
Test plan
🤖 Generated with Claude Code
Context
Judges posted stale findings after PRs were force-pushed. Two root causes: (1) duplicate worktrees — legacy pr-/ worktrees could remain unsynced while pr-review-/ (the new path) received --force updates, and prompts pointed judges to the stale legacy path; (2) resumed judges retained stale session memory across panel runs so prior reasoning could override fresh reads.
Changes
python/cube/automation/judge_panel.py
git rev-parse HEAD) and pass review_worktree into prompt construction so prompt paths match the file-scope used by_apply_file_scope.python/cube/automation/prompts.py
worktree_path_override: Optional[Path] = Nonetobuild_peer_review_prompt(...).worktree_path_overrideis given, instruct judges to read from that exact path.python/cube/commands/peer_review.py
~/.cube/worktrees/<repo>/pr-<n>) instead ofpr-review-<n>/, delete orphanpr-review-<n>/directories and rungit worktree prunein commits to ensure one canonical worktree per PR.Impact
Test plan
cube pr-review <n>(without--fresh): judges should see the HEAD callout and only re-raise findings that still apply._apply_file_scope's path.