Skip to content

fix(pr-review): eliminate stale legacy worktree + signal HEAD to resumed judges#176

Merged
jacsamell merged 2 commits into
mainfrom
fix/pr-review-stale-worktree-and-head-signal
May 19, 2026
Merged

fix(pr-review): eliminate stale legacy worktree + signal HEAD to resumed judges#176
jacsamell merged 2 commits into
mainfrom
fix/pr-review-stale-worktree-and-head-signal

Conversation

@jacsamell
Copy link
Copy Markdown
Contributor

@jacsamell jacsamell commented May 19, 2026

Symptom

Even after #175 landed (--force fetch), cube kept posting the same 10+ findings against pre-rewrite code on aetheronhq/aetheron-connect-v2#1266. Roy's diagnosis: cube is anchored on diff line numbers from an earlier commit.

Two more bugs surfaced

1. Duplicate worktrees, only one synced

PR reviews historically used `~/.cube/worktrees//pr-/` (legacy, created by `_prefetch_worktrees`). PR #172 added a second one at `pr-review-/` for the new file-routing flow. Both existed in parallel; only the new one had #175's --force + HEAD verification.

Worse: `build_peer_review_prompt` told judges to Read from the LEGACY path while `_apply_file_scope` pointed them at the NEW path. Same prompt, two locations, judges sometimes picked the stale one.

Confirmed live: both worktrees existed on PR #1266; the legacy one was at fda636a3 (6 commits behind the actual head 62e9fbd1).

2. Resumed judges had stale session memory

Sessions persist across panel runs (token-efficient). On the next run, the LLM continues its prior conversation including its prior reasoning about file contents and line numbers. Even when the resumed judge Reads fresh files, its reasoning anchors on the old memory. The previous "RE-REVIEW" directive was too soft to displace stale memory.

Fixes

  • Skip `_prefetch_worktrees` in PR-review mode when a dedicated `review_worktree` was passed. The new sync is authoritative.
  • Pipe `review_worktree` through `build_peer_review_prompt` so the prompt's worktree path matches the file-scope block. One location.
  • Replace the meek "verify commit" hint with a hard "CURRENT HEAD" callout: names the SHA, says "your prior reasoning may be out of date", lists 4 explicit steps (re-Read fresh, verify each prior finding still applies, add new ones, drop the rest).

Test plan

  • Force-push a PR, re-run `cube pr-review ` (no `--fresh`) — judges receive the HEAD callout, re-raise findings only if they're still real
  • Only one worktree exists per PR review (no legacy `pr-/` duplicate created)
  • Prompt's "Code Location" path matches `_apply_file_scope`'s path

🤖 Generated with Claude Code

Context

Judges posted stale findings after PRs were force-pushed. Two root causes: (1) duplicate worktrees — legacy pr-/ worktrees could remain unsynced while pr-review-/ (the new path) received --force updates, and prompts pointed judges to the stale legacy path; (2) resumed judges retained stale session memory across panel runs so prior reasoning could override fresh reads.

Changes

python/cube/automation/judge_panel.py

  • When a dedicated review_worktree is provided for PR-review runs (with a LOCAL: winner), skip the legacy _prefetch_worktrees sync path.
  • Compute HEAD SHA from the provided worktree (git rev-parse HEAD) and pass review_worktree into prompt construction so prompt paths match the file-scope used by _apply_file_scope.

python/cube/automation/prompts.py

  • Add worktree_path_override: Optional[Path] = None to build_peer_review_prompt(...).
  • Replace the previous soft "VERIFY COMMIT" hint with a hard "CURRENT HEAD" callout that names the SHA, warns prior reasoning may be out of date, and lists four explicit steps: re-read fresh files at HEAD, verify each prior finding still applies, add new findings, and drop invalid ones.
  • When worktree_path_override is given, instruct judges to read from that exact path.

python/cube/commands/peer_review.py

  • Restore the single per-PR worktree path (~/.cube/worktrees/<repo>/pr-<n>) instead of pr-review-<n>/, delete orphan pr-review-<n>/ directories and run git worktree prune in commits to ensure one canonical worktree per PR.

Impact

  • Eliminates stale duplicate-worktree behaviour by ensuring a single, caller-synced worktree receives the --force fetch and HEAD verification.
  • Prompts now point judges at the same worktree path used for file scoping.
  • Resumed judges are explicitly forced to re-evaluate prior findings against the current HEAD, reducing false positives from stale session memory.

Test plan

  • Force-push a PR and run cube pr-review <n> (without --fresh): judges should see the HEAD callout and only re-raise findings that still apply.
  • Confirm only one worktree exists per PR review (no legacy duplicate).
  • Confirm prompt "Code Location" path matches _apply_file_scope's path.

Review Change Stack

…med judges

Symptom: cube panel kept posting findings against pre-rewrite code on
aetheronhq/aetheron-connect-v2#1266 even after PR #175's --force fetch
landed. Same 10+ findings against lines/tables that no longer exist.

Two more layers of the bug surfaced:

(1) Duplicate worktrees, only one synced.
    PR reviews historically used `~/.cube/worktrees/<project>/pr-<n>/`
    (legacy, created by `_prefetch_worktrees`). PR #172 added a
    second one at `pr-review-<n>/` for the new file-routing flow.
    Both worktrees existed in parallel; only the new one had the
    --force fetch + HEAD verification from #175. The other stayed
    pinned to whatever it was last synced to.

    Worse: `build_peer_review_prompt` told judges to Read from the
    LEGACY path, while `_apply_file_scope` told them to Read from the
    NEW path. Same prompt, two locations.

    Confirmed live: both worktrees existed on PR #1266; the legacy
    one was stuck at fda636a3 (6 commits behind 62e9fbd1).

(2) Resumed judges had stale session memory.
    Sessions persist across panel runs. On the next run, the LLM
    continues its prior conversation — including its prior reasoning
    about file contents, line numbers, function shapes. Even when the
    resumed judge Reads fresh files, its reasoning anchors on the old
    memory. The previous "RE-REVIEW" hint was too soft to displace it.

Fixes:
  * Skip `_prefetch_worktrees` in PR-review mode when a dedicated
    `review_worktree` was passed. The new sync is authoritative; the
    legacy sync just made a second stale copy.
  * Pipe `review_worktree` through `build_peer_review_prompt` so the
    prompt's worktree path matches the file-scope block. One location.
  * Replace the meek "verify commit" hint with a hard "CURRENT HEAD"
    callout that names the SHA, says "your prior reasoning may be out
    of date", and lists 4 explicit steps: Re-Read fresh, verify each
    prior finding still applies, add new ones, drop the rest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 664d17dd-5365-4a5e-b984-fa3dc3dd6fb1

📥 Commits

Reviewing files that changed from the base of the PR and between 621edf2 and 5845ca1.

📒 Files selected for processing (1)
  • python/cube/commands/peer_review.py
📜 Recent review details
🔇 Additional comments (1)
python/cube/commands/peer_review.py (1)

230-235: LGTM!


Walkthrough

The PR updates judge-panel PR-review logic to use a caller-provided dedicated review worktree instead of relying on potentially stale synced duplicates. It derives the PR HEAD SHA from that worktree directly, standardises the per-PR worktree path, and routes the path into peer-review prompts which instruct judges to re-read all code at the current HEAD before reviewing.

Changes

Dedicated review worktree integration

Layer / File(s) Summary
Prompt parameter and current HEAD verification
python/cube/automation/prompts.py
build_peer_review_prompt signature extended with optional worktree_path_override parameter; docstring updated to describe how the override guides judges to the correct worktree path; prompt text replaces prior commit-verification block with expanded "CURRENT HEAD" section that instructs re-reading all files at current HEAD, removing invalidated findings, and running a git rev-parse HEAD sanity check.
Worktree resolution and prompt wiring
python/cube/automation/judge_panel.py, python/cube/commands/peer_review.py
launch_judge_panel now conditionally handles PR review: when review_type != "ui-review" and a review_worktree is provided alongside a LOCAL: form winner, it derives the PR HEAD SHA by running git rev-parse HEAD in that worktree, bypassing the legacy _prefetch_worktrees path; _ensure_pr_review_worktree now reuses the unified pr-<pr_number> per-repo worktree directory. The derived worktree path is routed to build_peer_review_prompt via the new worktree_path_override argument, ensuring judges use the caller-provided worktree.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • aetheronhq/agent-cube#175: Both PRs modify PR peer-review worktree setup and address stale worktree HEAD correctness.
  • aetheronhq/agent-cube#172: Both PRs change judge-panel and prompt flows to accept a caller-provided review_worktree and adjust prompt instructions accordingly.
  • aetheronhq/agent-cube#155: Related changes to peer/judge review prompt verification discipline and reading code facts from the correct HEAD.

Poem

🐰 I hopped to the worktree that you gave,
I sniffed the HEAD, I read each brave save,
"Re-read the files!" I softly plea,
Fresh code, fresh findings — that's the key.
🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the primary fix: eliminating stale legacy worktrees and adding HEAD signalling to judges, matching the core objectives of consolidating duplicate worktrees and improving judge context.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
python/cube/automation/judge_panel.py (1)

518-518: ⚡ Quick win

Move subprocess import to module level.

Importing inside a conditional block is non-idiomatic Python. Module-level imports improve readability and make dependencies easier to track.

♻️ Proposed refactor

At the top of the file, after the existing imports (around line 3):

 import asyncio
+import subprocess
 from datetime import datetime

Then remove line 518:

         if is_pr_review_with_dedicated_worktree:
             # Compute SHA from the worktree the caller synced.
-            import subprocess

             sha_result = subprocess.run(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/automation/judge_panel.py` at line 518, Move the local "import
subprocess" out of the conditional and add it to the module-level imports in
python/cube/automation/judge_panel.py alongside the other top imports, then
delete the in-function/conditional import site (the inline "import subprocess"
currently present) so the module consistently uses the top-level subprocess
symbol; run linting/tests to ensure no circular import or side-effect issues
after the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cube/automation/judge_panel.py`:
- Around line 520-527: The code silently sets worktree_head_sha to None when the
git call fails; change this to fail-fast or at least emit a clear warning: after
subprocess.run(...) check sha_result.returncode and if non-zero either raise an
exception (so the caller of the function in judge_panel.py fails fast) or call
the module logger to log an explicit error/warning including sha_result.stderr
and the review_worktree path, instead of continuing with worktree_head_sha=None;
update any callers that expect a string accordingly so the HEAD verification
block (which uses worktree_head_sha) always receives a valid SHA or the flow is
aborted with an explanatory error.

---

Nitpick comments:
In `@python/cube/automation/judge_panel.py`:
- Line 518: Move the local "import subprocess" out of the conditional and add it
to the module-level imports in python/cube/automation/judge_panel.py alongside
the other top imports, then delete the in-function/conditional import site (the
inline "import subprocess" currently present) so the module consistently uses
the top-level subprocess symbol; run linting/tests to ensure no circular import
or side-effect issues after the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e2077f8-bd70-4b16-a06e-1bf46588afd4

📥 Commits

Reviewing files that changed from the base of the PR and between 09bf3d4 and 621edf2.

📒 Files selected for processing (2)
  • python/cube/automation/judge_panel.py
  • python/cube/automation/prompts.py
📜 Review details
🔇 Additional comments (5)
python/cube/automation/prompts.py (4)

4-4: LGTM!


386-397: LGTM!


398-410: LGTM!


413-433: LGTM!

python/cube/automation/judge_panel.py (1)

601-608: LGTM!

Comment on lines +520 to +527
sha_result = subprocess.run(
["git", "rev-parse", "HEAD"],
cwd=review_worktree,
capture_output=True,
text=True,
timeout=10,
)
worktree_head_sha = sha_result.stdout.strip() if sha_result.returncode == 0 else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Silent failure defeats the HEAD verification fix.

If git rev-parse HEAD fails (e.g., corrupted worktree, not a git repository), worktree_head_sha is set to None, converted to "" at line 606, and the entire verification block is skipped in the prompt. Judges don't receive the explicit "CURRENT HEAD" instructions to re-read all files, allowing stale session memory to persist — one of the two bugs this PR aims to fix.

Raise an error when SHA extraction fails, or at minimum log a warning.

🛡️ Proposed fix to fail fast when SHA extraction fails
         sha_result = subprocess.run(
             ["git", "rev-parse", "HEAD"],
             cwd=review_worktree,
             capture_output=True,
             text=True,
             timeout=10,
         )
-        worktree_head_sha = sha_result.stdout.strip() if sha_result.returncode == 0 else None
+        if sha_result.returncode != 0:
+            raise RuntimeError(
+                f"Failed to extract HEAD SHA from review worktree {review_worktree}. "
+                f"Cannot verify code freshness. stderr: {sha_result.stderr}"
+            )
+        worktree_head_sha = sha_result.stdout.strip()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/automation/judge_panel.py` around lines 520 - 527, The code
silently sets worktree_head_sha to None when the git call fails; change this to
fail-fast or at least emit a clear warning: after subprocess.run(...) check
sha_result.returncode and if non-zero either raise an exception (so the caller
of the function in judge_panel.py fails fast) or call the module logger to log
an explicit error/warning including sha_result.stderr and the review_worktree
path, instead of continuing with worktree_head_sha=None; update any callers that
expect a string accordingly so the HEAD verification block (which uses
worktree_head_sha) always receives a valid SHA or the flow is aborted with an
explanatory error.

…<n>/

PR #172 invented `pr-review-<n>/` as a parallel worktree to hold the
file-routing flow's check-out. The original `pr-<n>/` (managed by
`_prefetch_worktrees` for the cube panel + `_get_cli_review_worktrees`
for CodeRabbit) kept existing alongside it. Result: two worktrees per
PR, only one kept synced by the new code, prompts pointing at the
wrong one, judges Reading stale content.

Reverting the path so the file-routing flow reuses `pr-<n>/`. Combined
with the #176 skip-`_prefetch_worktrees`-when-already-synced logic,
this gives one worktree per PR with the new --force fetch + HEAD
verification applied to it. CodeRabbit and cube judges both read
from the same place.

`pr-review-<n>/` orphan dirs deleted from the local cache; `git
worktree prune` cleaned the stale refs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jacsamell jacsamell merged commit 7740fa4 into main May 19, 2026
0 of 2 checks passed
@jacsamell jacsamell deleted the fix/pr-review-stale-worktree-and-head-signal branch May 19, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant