Skip to content

fix(panel): judge cwd = PR review worktree, not operator's CWD#182

Merged
jacsamell merged 1 commit into
mainfrom
fix/judge-cwd-uses-review-worktree
May 19, 2026
Merged

fix(panel): judge cwd = PR review worktree, not operator's CWD#182
jacsamell merged 1 commit into
mainfrom
fix/judge-cwd-uses-review-worktree

Conversation

@jacsamell
Copy link
Copy Markdown
Contributor

@jacsamell jacsamell commented May 19, 2026

Symptom

Operator running cube from a Claude Code session worktree (e.g. `.claude/worktrees/cool-satoshi-bd8d21`) was seeing judges report findings against code that didn't exist on the actual PR. Investigation showed judges were Reading from the operator's session worktree — dozens of commits behind, on an unrelated branch — not from the cube-synced PR review worktree.

Specifically: judges flagged `apps/api/.../line:728` as buggy. Line 728 on the PR head was a totally different function from line 728 on the operator's session branch.

Root cause

`run_dir = WORKTREE_BASE.parent if cli_name == "gemini" else PROJECT_ROOT`

For all non-gemini CLIs, the judge inherits cwd = PROJECT_ROOT (the operator's repo checkout). Relative-path Reads land there, regardless of what worktree the prompt instructed.

Fix

Use `judge_info.review_worktree` as the cwd when set (PR #172 already populates it from the synced PR worktree at `~/.cube/worktrees//pr-/`). Falls back to the legacy paths when no review worktree was wired (writer reviews, gemini, etc).

Test plan

  • Run `cube prv ` from a Claude Code worktree on an unrelated branch — judges Read from the synced PR worktree, not the operator's branch
  • Confirm absolute-path Reads still work (the prompt's `Read /Users/.../pr-/` instructions are unaffected)
  • All 237 cube tests pass (verified locally)

🤖 Generated with Claude Code

Deterministic Verify Gate Between Writer and Judge Phases

This PR introduces a deterministic verification system that runs between the writer and judge phases to eliminate wasteful token churn.

Problem Solved:
Writers were spending massive tokens running their own test/lint/typecheck loops, and judges were burning tokens flagging "this won't build" findings. The verify gate runs once authoritatively instead.

How It Works:

  1. Writer commits as usual
  2. Cube runs the repo's configured verify.cmd in the writer's worktree, capturing logs to .cube/verify-logs/<task>-attempt-<N>.log
  3. On failure, a minimal feedback prompt is sent to the writer with the absolute log path—the writer Reads the log, fixes the issue, and commits
  4. Loop repeats up to max_attempts (default 3)
  5. On final failure, judges still run with the failure history visible for context

Key Changes:

  • verify.py (new, 181 lines): Implements VerifyResult dataclass and run_verify_loop() which executes the verify command with timeout handling, log truncation, and async writer feedback resumption via send_feedback_async
  • handlers.py Phase 2: Extended phase2_run_writers to integrate the verify-and-repair loop, deriving writer worktrees and independently running verification for each
  • Writer Prompt: Updated to explicitly instruct writers not to run tests/lint/typecheck locally—Cube now does this authoritatively after they exit
  • Config: New VerifyConfig block in cube.yaml with fields cmd, timeout_seconds (600s default), and max_attempts (3 default); empty cmd disables the gate
  • Default v2 Config: pnpm install --frozen-lockfile && pnpm typecheck && pnpm lint && pnpm test

Impact:
Removes the writer's most expensive habit, streamlines the workflow by centralising verification, and surfaces failure context to judges when verification does fail.

Review Change Stack

Operator observation: writers were spending massive token churn running their own pnpm test / typecheck / lint loops, and judges were burning tokens flagging 'this won't build' findings. Both wasteful — deterministic verify can be run once by cube, authoritatively.

New phase folded into phase2_run_writers (keeps phase numbering stable):

1. Writer commits as usual

2. cube runs verify.cmd in the writer worktree, captures combined stdout/stderr to .cube/verify-logs/<task>-attempt-<N>.log

3. On failure: tiny feedback prompt with absolute log path, resume the writer. Writer Reads the log, fixes, commits. send_feedback_async auto-commits on exit (PR #178).

4. Loop up to max_attempts (default 3). On final failure, judges still run.

Writer prompt updated: 'Don't run tests / lint / typecheck — cube does it after you exit.' Removes the writer's most expensive habit.

Config in cube.yaml verify section. Empty cmd disables the gate. v2 cube.yaml ships with: pnpm install --frozen-lockfile && pnpm typecheck && pnpm lint && pnpm test
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Walkthrough

This pull request introduces an automated deterministic verify gate that executes repo-configured verification commands after writers commit, captures logs, retries on failure by resuming the writer with feedback, and reports results to the orchestrator.

Changes

Verify Gate Implementation

Layer / File(s) Summary
Configuration schema for verify gate
python/cube/core/user_config.py
VerifyConfig dataclass defines cmd, timeout_seconds, and max_attempts with sensible defaults; CubeConfig now includes a verify field, and the loader parses verify settings from merged YAML into the cached configuration.
Verify loop module with command execution and retry
python/cube/automation/verify.py
run_verify_loop executes verify.cmd via bash in a worktree, captures combined stdout/stderr to timestamped log files in .cube/verify-logs, and on failure constructs a markdown feedback prompt (with absolute log path and "do not self-verify" instructions) to resume the writer for up to max_attempts tries. Returns VerifyResult with pass/fail, attempt count, and final log details.
Phase 2 orchestration with verify loop invocation
python/cube/commands/orchestrate/handlers.py
Phase 2 handler checks if verification is configured, then loads user config and builds a per-writer worktree list (single vs dual mode), filters non-existent paths, and runs run_verify_loop independently for each writer, returning verify_results with writer labels, pass/fail status, and attempt counts.
Writer prompt updated with verify gate instructions
python/cube/commands/orchestrate/prompts.py
Writer prompt now explicitly instructs writers not to self-verify; instead, writers are informed that Cube runs the deterministic verify gate after commit and will resume the writer with log paths on failure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A gate of truth now stands so firm,
Verify and learn, then watch it turn—
On failure, we resume with care,
Log paths and feedback in the air!
No self-checks now, just trust the flow,
Cube's deterministic verify's show. 🌟


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Title check ❌ Error The PR title describes fixing cwd selection for judges to use PR review worktree, but the actual changeset primarily implements a deterministic verify gate loop with configuration, prompt updates, and handler modifications—not a cwd fix. Align the title with the actual changes: consider 'feat(verify): add deterministic verify gate with writer feedback loop' or update the PR description to clarify the actual scope.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cube/automation/verify.py`:
- Around line 124-170: The final return currently hardcodes
attempts=max_attempts which misreports when the loop exits early; change the
final VerifyResult return to use the actual last attempt count (e.g.,
attempts=attempt) and ensure attempt is referenced from the for-loop (or fall
back to 0 if not set) so that VerifyResult(ok=False, attempts=attempt,
last_log_path=last_log_path, last_exit_code=last_exit_code) reflects the real
number of verify attempts; reference the for-loop variable "attempt" and the
VerifyResult construction to locate where to update.

In `@python/cube/commands/orchestrate/handlers.py`:
- Around line 89-94: The loop over writer_keys in handlers.py currently uses
"continue" when a writer worktree (computed via WORKTREE_BASE / project_name /
f\"writer-{wconf.name}-{ctx.task_id}\") is missing, which silently bypasses
verification; change this so missing worktrees cause a hard failure instead of
quietly skipping: when get_writer_config(wkey) yields wconf but
worktree.exists() is False, log an error including wkey and wconf.name (use the
existing logger), and either raise a clear exception or record the failure in
the verification result so the phase returns non-success (do not use continue).
Apply the same behavior to the analogous block handling lines 119-124 so missing
writer worktrees consistently fail verify rather than being ignored.

In `@python/cube/commands/orchestrate/prompts.py`:
- Around line 46-59: Update the writer prompt generation to only include the "do
NOT self-verify" paragraph when the repo has a verify command configured (i.e.,
verify.cmd is set/non-empty); detect the verify setting and conditionally append
the string "Focus on the code change. Don't run tests / typecheck / lint — cube
runs them deterministically after you exit. If verify fails, cube will resume
you with the log path; Read it and fix." to the prompt output in
python/cube/commands/orchestrate/prompts.py instead of unconditionally embedding
it, referencing the verify configuration key (verify.cmd) when constructing the
prompt.

In `@python/cube/core/user_config.py`:
- Around line 246-251: The parsing for verify config in load_config() assumes
verify_raw is a dict and that timeout_seconds/max_attempts are int-coercible,
which will raise for malformed YAML like "verify: true" or "timeout_seconds:
'fast'"; update the block that builds verify_raw and verify_cfg (symbols:
verify_raw, VerifyConfig, verify_cfg, load_config) to first ensure verify_raw is
a mapping (fall back to {} if not), extract cmd using str(...) but only if
present, and safely parse timeout_seconds and max_attempts by attempting int()
in a try/except (or using conditional isinstance checks) falling back to the
existing defaults (600 and 3) when parsing fails or values are missing; keep the
creation of VerifyConfig but feed it these validated/coerced values so malformed
types do not abort load_config().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3001ad13-a248-408c-a4bf-8ad0dac6d668

📥 Commits

Reviewing files that changed from the base of the PR and between c88ab68 and a60c8d9.

📒 Files selected for processing (4)
  • python/cube/automation/verify.py
  • python/cube/commands/orchestrate/handlers.py
  • python/cube/commands/orchestrate/prompts.py
  • python/cube/core/user_config.py
📜 Review details
🧰 Additional context used
🪛 Ruff (0.15.13)
python/cube/automation/verify.py

[error] 61-61: subprocess call: check for execution of untrusted input

(S603)


[error] 62-62: Starting a process with a partial executable path

(S607)

Comment on lines +124 to +170
for attempt in range(1, max_attempts + 1):
log_path = _logs_dir() / f"{task_id}-attempt-{attempt}.log"
print_info(f"🔬 Verify (attempt {attempt}/{max_attempts}): {verify_cmd}")
exit_code, tail = _run_once(verify_cmd, worktree, timeout_seconds, log_path)
last_log_path = log_path
last_exit_code = exit_code

if exit_code == 0:
print_success(f"Verify passed (attempt {attempt}/{max_attempts})")
return VerifyResult(ok=True, attempts=attempt, last_log_path=log_path, last_exit_code=0)

print_warning(f"Verify failed (exit {exit_code}). Log: {log_path}")
console.print(f"[dim]Tail:\n{tail[-1500:]}[/dim]")

if attempt >= max_attempts:
print_error(
f"Verify still failing after {max_attempts} attempts. "
"Handing current state to judges; they will grade against the failing build."
)
break

# Write a feedback prompt to disk, then resume the writer with it.
feedback_path = Path(PROJECT_ROOT) / ".prompts" / f"verify-feedback-{task_id}-{attempt}.md"
feedback_path.parent.mkdir(parents=True, exist_ok=True)
feedback_path.write_text(_feedback_prompt(log_path, exit_code, attempt, max_attempts, verify_cmd))

from ..core.session import load_session

session_id = load_session(writer_info.key.upper(), task_id)
if not session_id:
print_warning(f"No session to resume for {writer_info.label}; aborting verify loop")
break

await send_feedback_async(
task_id=task_id,
feedback_file=feedback_path,
session_id=session_id,
worktree=worktree,
writer_name=writer_info.name,
writer_model=writer_info.model,
writer_label=writer_info.label,
writer_key=writer_info.key,
writer_color=writer_info.color,
)
# send_feedback_async commits any writer changes (PR #178).

return VerifyResult(ok=False, attempts=max_attempts, last_log_path=last_log_path, last_exit_code=last_exit_code)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return the actual number of verify attempts performed.

If the loop exits early (e.g., Line 154 no session to resume), Line 170 still reports attempts=max_attempts, which misreports execution state.

💡 Suggested fix
-    for attempt in range(1, max_attempts + 1):
+    attempts_run = 0
+    for attempt in range(1, max_attempts + 1):
+        attempts_run = attempt
         log_path = _logs_dir() / f"{task_id}-attempt-{attempt}.log"
@@
-            break
+            break
@@
-            break
+            break
@@
-    return VerifyResult(ok=False, attempts=max_attempts, last_log_path=last_log_path, last_exit_code=last_exit_code)
+    return VerifyResult(
+        ok=False,
+        attempts=attempts_run,
+        last_log_path=last_log_path,
+        last_exit_code=last_exit_code,
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/automation/verify.py` around lines 124 - 170, The final return
currently hardcodes attempts=max_attempts which misreports when the loop exits
early; change the final VerifyResult return to use the actual last attempt count
(e.g., attempts=attempt) and ensure attempt is referenced from the for-loop (or
fall back to 0 if not set) so that VerifyResult(ok=False, attempts=attempt,
last_log_path=last_log_path, last_exit_code=last_exit_code) reflects the real
number of verify attempts; reference the for-loop variable "attempt" and the
VerifyResult construction to locate where to update.

Comment on lines +89 to +94
for wkey in writer_keys:
wconf = get_writer_config(wkey)
worktree = WORKTREE_BASE / project_name / f"writer-{wconf.name}-{ctx.task_id}"
if not worktree.exists():
continue
writers.append(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently bypass verify when writer worktrees are missing.

Lines 92-94 continue quietly skips verification, and the phase still reports success. If worktree resolution regresses, verify can be effectively disabled with no hard signal.

💡 Suggested fix
-    writers: list[WriterInfo] = []
+    writers: list[WriterInfo] = []
+    missing_worktrees: list[str] = []
@@
-        if not worktree.exists():
-            continue
+        if not worktree.exists():
+            missing_worktrees.append(f"{wconf.label}: {worktree}")
+            continue
@@
+    for item in missing_worktrees:
+        print_warning(f"Verify skipped for missing worktree: {item}")
+
+    if not writers:
+        print_error("Verify is configured but no writer worktrees were found.")
+        raise typer.Exit(1)

Also applies to: 119-124

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/commands/orchestrate/handlers.py` around lines 89 - 94, The loop
over writer_keys in handlers.py currently uses "continue" when a writer worktree
(computed via WORKTREE_BASE / project_name /
f\"writer-{wconf.name}-{ctx.task_id}\") is missing, which silently bypasses
verification; change this so missing worktrees cause a hard failure instead of
quietly skipping: when get_writer_config(wkey) yields wconf but
worktree.exists() is False, log an error including wkey and wconf.name (use the
existing logger), and either raise a clear exception or record the failure in
the verification result so the phase returns non-success (do not use continue).
Apply the same behavior to the analogous block handling lines 119-124 so missing
writer worktrees consistently fail verify rather than being ignored.

Comment on lines +46 to +59
### Do NOT self-verify — cube runs the deterministic verify gate
Cube runs the repo's verify command (typecheck + lint + tests) automatically
after the writer commits. **Writers must NOT run `pnpm verify` / `task verify` /
`npm test` / `pytest` / lint themselves.** Reasons:
- Cube's run is authoritative; writer churn on the same commands is wasted tokens.
- If verify fails, cube re-resumes the writer with a pointer to the log on disk
(writer uses `Read` to inspect, fixes, commits — cube runs verify again).
- Judges only see code that passes verify (or has hit the retry cap), so the
panel never burns tokens grading "this won't build" findings.

**Include this in the writer prompt** as an explicit instruction:
"Focus on the code change. Don't run tests / typecheck / lint — cube runs them
deterministically after you exit. If verify fails, cube will resume you with
the log path; Read it and fix."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make verify-gate instructions conditional on config.

This block assumes verify is always enabled, but verify.cmd can be unset/empty. In that case, telling writers “do NOT self-verify” is incorrect and can let unverified changes through.

Suggested patch
 async def generate_writer_prompt(task_id: str, task_content: str, prompts_dir: Path) -> Path:
@@
-    prompt = f"""# Task: Generate Writer Prompt
+    from ...automation.verify import is_verify_configured
+    from ...core.user_config import load_config
+
+    verify_enabled = is_verify_configured(load_config())
+    verify_instructions = (
+        """### Do NOT self-verify — cube runs the deterministic verify gate
+Cube runs the repo's verify command (typecheck + lint + tests) automatically
+after the writer commits. **Writers must NOT run `pnpm verify` / `task verify` /
+`npm test` / `pytest` / lint themselves.** Reasons:
+- Cube's run is authoritative; writer churn on the same commands is wasted tokens.
+- If verify fails, cube re-resumes the writer with a pointer to the log on disk
+  (writer uses `Read` to inspect, fixes, commits — cube runs verify again).
+- Judges only see code that passes verify (or has hit the retry cap), so the
+  panel never burns tokens grading "this won't build" findings.
+
+**Include this in the writer prompt** as an explicit instruction:
+"Focus on the code change. Don't run tests / typecheck / lint — cube runs them
+deterministically after you exit. If verify fails, cube will resume you with
+the log path; Read it and fix."
+"""
+        if verify_enabled
+        else """### Verification responsibility
+Verify gate is not configured for this repo run. Writers must run the project's
+verification checks (tests/lint/typecheck) before commit/push."""
+    )
+
+    prompt = f"""# Task: Generate Writer Prompt
@@
-### Do NOT self-verify — cube runs the deterministic verify gate
-Cube runs the repo's verify command (typecheck + lint + tests) automatically
-after the writer commits. **Writers must NOT run `pnpm verify` / `task verify` /
-`npm test` / `pytest` / lint themselves.** Reasons:
-- Cube's run is authoritative; writer churn on the same commands is wasted tokens.
-- If verify fails, cube re-resumes the writer with a pointer to the log on disk
-  (writer uses `Read` to inspect, fixes, commits — cube runs verify again).
-- Judges only see code that passes verify (or has hit the retry cap), so the
-  panel never burns tokens grading "this won't build" findings.
-
-**Include this in the writer prompt** as an explicit instruction:
-"Focus on the code change. Don't run tests / typecheck / lint — cube runs them
-deterministically after you exit. If verify fails, cube will resume you with
-the log path; Read it and fix."
+{verify_instructions}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/commands/orchestrate/prompts.py` around lines 46 - 59, Update the
writer prompt generation to only include the "do NOT self-verify" paragraph when
the repo has a verify command configured (i.e., verify.cmd is set/non-empty);
detect the verify setting and conditionally append the string "Focus on the code
change. Don't run tests / typecheck / lint — cube runs them deterministically
after you exit. If verify fails, cube will resume you with the log path; Read it
and fix." to the prompt output in python/cube/commands/orchestrate/prompts.py
instead of unconditionally embedding it, referencing the verify configuration
key (verify.cmd) when constructing the prompt.

Comment on lines +246 to +251
verify_raw = data.get("verify") or {}
verify_cfg = VerifyConfig(
cmd=str(verify_raw.get("cmd", "")).strip(),
timeout_seconds=int(verify_raw.get("timeout_seconds", 600)),
max_attempts=int(verify_raw.get("max_attempts", 3)),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden verify config parsing against malformed YAML values.

Line 246 and Lines 249-250 assume valid types (dict + int-coercible values). A config like verify: true or timeout_seconds: "fast" will throw and abort load_config().

💡 Suggested fix
-    verify_raw = data.get("verify") or {}
-    verify_cfg = VerifyConfig(
-        cmd=str(verify_raw.get("cmd", "")).strip(),
-        timeout_seconds=int(verify_raw.get("timeout_seconds", 600)),
-        max_attempts=int(verify_raw.get("max_attempts", 3)),
-    )
+    verify_raw = data.get("verify")
+    if not isinstance(verify_raw, dict):
+        verify_raw = {}
+
+    def _as_int(value: object, default: int, *, minimum: int) -> int:
+        try:
+            parsed = int(value)
+        except (TypeError, ValueError):
+            return default
+        return max(parsed, minimum)
+
+    verify_cfg = VerifyConfig(
+        cmd=str(verify_raw.get("cmd", "")).strip(),
+        timeout_seconds=_as_int(verify_raw.get("timeout_seconds", 600), 600, minimum=1),
+        max_attempts=_as_int(verify_raw.get("max_attempts", 3), 3, minimum=1),
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
verify_raw = data.get("verify") or {}
verify_cfg = VerifyConfig(
cmd=str(verify_raw.get("cmd", "")).strip(),
timeout_seconds=int(verify_raw.get("timeout_seconds", 600)),
max_attempts=int(verify_raw.get("max_attempts", 3)),
)
verify_raw = data.get("verify")
if not isinstance(verify_raw, dict):
verify_raw = {}
def _as_int(value: object, default: int, *, minimum: int) -> int:
try:
parsed = int(value)
except (TypeError, ValueError):
return default
return max(parsed, minimum)
verify_cfg = VerifyConfig(
cmd=str(verify_raw.get("cmd", "")).strip(),
timeout_seconds=_as_int(verify_raw.get("timeout_seconds", 600), 600, minimum=1),
max_attempts=_as_int(verify_raw.get("max_attempts", 3), 3, minimum=1),
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/core/user_config.py` around lines 246 - 251, The parsing for
verify config in load_config() assumes verify_raw is a dict and that
timeout_seconds/max_attempts are int-coercible, which will raise for malformed
YAML like "verify: true" or "timeout_seconds: 'fast'"; update the block that
builds verify_raw and verify_cfg (symbols: verify_raw, VerifyConfig, verify_cfg,
load_config) to first ensure verify_raw is a mapping (fall back to {} if not),
extract cmd using str(...) but only if present, and safely parse timeout_seconds
and max_attempts by attempting int() in a try/except (or using conditional
isinstance checks) falling back to the existing defaults (600 and 3) when
parsing fails or values are missing; keep the creation of VerifyConfig but feed
it these validated/coerced values so malformed types do not abort load_config().

@jacsamell jacsamell merged commit 1b12a10 into main May 19, 2026
0 of 4 checks passed
@jacsamell jacsamell deleted the fix/judge-cwd-uses-review-worktree branch May 19, 2026 22:30
jacsamell added a commit that referenced this pull request May 19, 2026
) (#184)

PR #182 and #183 squash-merged to empty diffs because they were stacked off the verify-gate branch instead of main. Re-applying both fixes against the actual main HEAD:

1. judge_panel.run_judge: use judge_info.review_worktree as the cwd when the PR review flow synced one. Stops judges from inheriting the operator's Claude Code session worktree (which may be on a stale unrelated branch). Previously: judges reviewed code from cool-satoshi worktree on commits dozens behind the actual PR.

2. config._find_git_root: use 'git rev-parse --git-common-dir' to find the MAIN repo working tree, not the per-Claude-Code-session worktree. All worktrees of the same repo now share .agent-sessions/, .prompts/decisions/, .cube/. Eliminates the 'No session found' regression that came from state being scattered across worktrees.

Verified: PROJECT_ROOT resolves to the main repo root from both the main checkout and any Claude Code session worktree. 237 tests pass.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
jacsamell added a commit that referenced this pull request May 19, 2026
… canonical path (#187)

Codex judges run with --sandbox workspace-write and --cd <pr-N worktree> (PR #182). The sandbox blocks writes anywhere outside the cwd workspace — including the main repo's .prompts/decisions/ where decision JSONs MUST land. Result: gpt-5.5 judges (Backend, Frontend & UX) silently failed to write their decisions; only opus/claude judges (which use a different sandbox) actually persisted.

Pass --add-dir <project_root> to codex when worktree differs from PROJECT_ROOT. Sandbox stays in place; just adds the main repo to the writeable allowlist so judges can write the decision file at its canonical absolute path.

find_decision_file's worktree-scan fallback still acts as a safety net for any judge that writes to the worktree's .prompts/decisions/ instead.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant