fix(panel): judge cwd = PR review worktree, not operator's CWD by jacsamell · Pull Request #182 · aetheronhq/agent-cube

jacsamell · 2026-05-19T22:25:35Z

Symptom

Operator running cube from a Claude Code session worktree (e.g. `.claude/worktrees/cool-satoshi-bd8d21`) was seeing judges report findings against code that didn't exist on the actual PR. Investigation showed judges were Reading from the operator's session worktree — dozens of commits behind, on an unrelated branch — not from the cube-synced PR review worktree.

Specifically: judges flagged `apps/api/.../line:728` as buggy. Line 728 on the PR head was a totally different function from line 728 on the operator's session branch.

Root cause

`run_dir = WORKTREE_BASE.parent if cli_name == "gemini" else PROJECT_ROOT`

For all non-gemini CLIs, the judge inherits cwd = PROJECT_ROOT (the operator's repo checkout). Relative-path Reads land there, regardless of what worktree the prompt instructed.

Fix

Use `judge_info.review_worktree` as the cwd when set (PR #172 already populates it from the synced PR worktree at `~/.cube/worktrees//pr-/`). Falls back to the legacy paths when no review worktree was wired (writer reviews, gemini, etc).

Test plan

Run `cube prv ` from a Claude Code worktree on an unrelated branch — judges Read from the synced PR worktree, not the operator's branch
Confirm absolute-path Reads still work (the prompt's `Read /Users/.../pr-/` instructions are unaffected)
All 237 cube tests pass (verified locally)

🤖 Generated with Claude Code

Deterministic Verify Gate Between Writer and Judge Phases

This PR introduces a deterministic verification system that runs between the writer and judge phases to eliminate wasteful token churn.

Problem Solved:
Writers were spending massive tokens running their own test/lint/typecheck loops, and judges were burning tokens flagging "this won't build" findings. The verify gate runs once authoritatively instead.

How It Works:

Writer commits as usual
Cube runs the repo's configured verify.cmd in the writer's worktree, capturing logs to .cube/verify-logs/<task>-attempt-<N>.log
On failure, a minimal feedback prompt is sent to the writer with the absolute log path—the writer Reads the log, fixes the issue, and commits
Loop repeats up to max_attempts (default 3)
On final failure, judges still run with the failure history visible for context

Key Changes:

verify.py (new, 181 lines): Implements VerifyResult dataclass and run_verify_loop() which executes the verify command with timeout handling, log truncation, and async writer feedback resumption via send_feedback_async
handlers.py Phase 2: Extended phase2_run_writers to integrate the verify-and-repair loop, deriving writer worktrees and independently running verification for each
Writer Prompt: Updated to explicitly instruct writers not to run tests/lint/typecheck locally—Cube now does this authoritatively after they exit
Config: New VerifyConfig block in cube.yaml with fields cmd, timeout_seconds (600s default), and max_attempts (3 default); empty cmd disables the gate
Default v2 Config: pnpm install --frozen-lockfile && pnpm typecheck && pnpm lint && pnpm test

Impact:
Removes the writer's most expensive habit, streamlines the workflow by centralising verification, and surfaces failure context to judges when verification does fail.

Operator observation: writers were spending massive token churn running their own pnpm test / typecheck / lint loops, and judges were burning tokens flagging 'this won't build' findings. Both wasteful — deterministic verify can be run once by cube, authoritatively. New phase folded into phase2_run_writers (keeps phase numbering stable): 1. Writer commits as usual 2. cube runs verify.cmd in the writer worktree, captures combined stdout/stderr to .cube/verify-logs/<task>-attempt-<N>.log 3. On failure: tiny feedback prompt with absolute log path, resume the writer. Writer Reads the log, fixes, commits. send_feedback_async auto-commits on exit (PR #178). 4. Loop up to max_attempts (default 3). On final failure, judges still run. Writer prompt updated: 'Don't run tests / lint / typecheck — cube does it after you exit.' Removes the writer's most expensive habit. Config in cube.yaml verify section. Empty cmd disables the gate. v2 cube.yaml ships with: pnpm install --frozen-lockfile && pnpm typecheck && pnpm lint && pnpm test

coderabbitai · 2026-05-19T22:25:57Z

Walkthrough

This pull request introduces an automated deterministic verify gate that executes repo-configured verification commands after writers commit, captures logs, retries on failure by resuming the writer with feedback, and reports results to the orchestrator.

Changes

Verify Gate Implementation

Layer / File(s)	Summary
Configuration schema for verify gate `python/cube/core/user_config.py`	`VerifyConfig` dataclass defines `cmd`, `timeout_seconds`, and `max_attempts` with sensible defaults; `CubeConfig` now includes a `verify` field, and the loader parses verify settings from merged YAML into the cached configuration.
Verify loop module with command execution and retry `python/cube/automation/verify.py`	`run_verify_loop` executes `verify.cmd` via bash in a worktree, captures combined stdout/stderr to timestamped log files in `.cube/verify-logs`, and on failure constructs a markdown feedback prompt (with absolute log path and "do not self-verify" instructions) to resume the writer for up to `max_attempts` tries. Returns `VerifyResult` with pass/fail, attempt count, and final log details.
Phase 2 orchestration with verify loop invocation `python/cube/commands/orchestrate/handlers.py`	Phase 2 handler checks if verification is configured, then loads user config and builds a per-writer worktree list (single vs dual mode), filters non-existent paths, and runs `run_verify_loop` independently for each writer, returning `verify_results` with writer labels, pass/fail status, and attempt counts.
Writer prompt updated with verify gate instructions `python/cube/commands/orchestrate/prompts.py`	Writer prompt now explicitly instructs writers not to self-verify; instead, writers are informed that Cube runs the deterministic verify gate after commit and will resume the writer with log paths on failure.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A gate of truth now stands so firm,
Verify and learn, then watch it turn—
On failure, we resume with care,
Log paths and feedback in the air!
No self-checks now, just trust the flow,
Cube's deterministic verify's show. 🌟

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
Title check	❌ Error	The PR title describes fixing cwd selection for judges to use PR review worktree, but the actual changeset primarily implements a deterministic verify gate loop with configuration, prompt updates, and handler modifications—not a cwd fix.	Align the title with the actual changes: consider 'feat(verify): add deterministic verify gate with writer feedback loop' or update the PR description to clarify the actual scope.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cube/automation/verify.py`:
- Around line 124-170: The final return currently hardcodes
attempts=max_attempts which misreports when the loop exits early; change the
final VerifyResult return to use the actual last attempt count (e.g.,
attempts=attempt) and ensure attempt is referenced from the for-loop (or fall
back to 0 if not set) so that VerifyResult(ok=False, attempts=attempt,
last_log_path=last_log_path, last_exit_code=last_exit_code) reflects the real
number of verify attempts; reference the for-loop variable "attempt" and the
VerifyResult construction to locate where to update.

In `@python/cube/commands/orchestrate/handlers.py`:
- Around line 89-94: The loop over writer_keys in handlers.py currently uses
"continue" when a writer worktree (computed via WORKTREE_BASE / project_name /
f\"writer-{wconf.name}-{ctx.task_id}\") is missing, which silently bypasses
verification; change this so missing worktrees cause a hard failure instead of
quietly skipping: when get_writer_config(wkey) yields wconf but
worktree.exists() is False, log an error including wkey and wconf.name (use the
existing logger), and either raise a clear exception or record the failure in
the verification result so the phase returns non-success (do not use continue).
Apply the same behavior to the analogous block handling lines 119-124 so missing
writer worktrees consistently fail verify rather than being ignored.

In `@python/cube/commands/orchestrate/prompts.py`:
- Around line 46-59: Update the writer prompt generation to only include the "do
NOT self-verify" paragraph when the repo has a verify command configured (i.e.,
verify.cmd is set/non-empty); detect the verify setting and conditionally append
the string "Focus on the code change. Don't run tests / typecheck / lint — cube
runs them deterministically after you exit. If verify fails, cube will resume
you with the log path; Read it and fix." to the prompt output in
python/cube/commands/orchestrate/prompts.py instead of unconditionally embedding
it, referencing the verify configuration key (verify.cmd) when constructing the
prompt.

In `@python/cube/core/user_config.py`:
- Around line 246-251: The parsing for verify config in load_config() assumes
verify_raw is a dict and that timeout_seconds/max_attempts are int-coercible,
which will raise for malformed YAML like "verify: true" or "timeout_seconds:
'fast'"; update the block that builds verify_raw and verify_cfg (symbols:
verify_raw, VerifyConfig, verify_cfg, load_config) to first ensure verify_raw is
a mapping (fall back to {} if not), extract cmd using str(...) but only if
present, and safely parse timeout_seconds and max_attempts by attempting int()
in a try/except (or using conditional isinstance checks) falling back to the
existing defaults (600 and 3) when parsing fails or values are missing; keep the
creation of VerifyConfig but feed it these validated/coerced values so malformed
types do not abort load_config().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3001ad13-a248-408c-a4bf-8ad0dac6d668

📥 Commits

Reviewing files that changed from the base of the PR and between c88ab68 and a60c8d9.

📒 Files selected for processing (4)

python/cube/automation/verify.py
python/cube/commands/orchestrate/handlers.py
python/cube/commands/orchestrate/prompts.py
python/cube/core/user_config.py

📜 Review details

🧰 Additional context used

🪛 Ruff (0.15.13)

python/cube/automation/verify.py

[error] 61-61: subprocess call: check for execution of untrusted input

(S603)

[error] 62-62: Starting a process with a partial executable path

(S607)

coderabbitai · 2026-05-19T22:29:28Z

+    for attempt in range(1, max_attempts + 1):
+        log_path = _logs_dir() / f"{task_id}-attempt-{attempt}.log"
+        print_info(f"🔬 Verify (attempt {attempt}/{max_attempts}): {verify_cmd}")
+        exit_code, tail = _run_once(verify_cmd, worktree, timeout_seconds, log_path)
+        last_log_path = log_path
+        last_exit_code = exit_code
+
+        if exit_code == 0:
+            print_success(f"Verify passed (attempt {attempt}/{max_attempts})")
+            return VerifyResult(ok=True, attempts=attempt, last_log_path=log_path, last_exit_code=0)
+
+        print_warning(f"Verify failed (exit {exit_code}). Log: {log_path}")
+        console.print(f"[dim]Tail:\n{tail[-1500:]}[/dim]")
+
+        if attempt >= max_attempts:
+            print_error(
+                f"Verify still failing after {max_attempts} attempts. "
+                "Handing current state to judges; they will grade against the failing build."
+            )
+            break
+
+        # Write a feedback prompt to disk, then resume the writer with it.
+        feedback_path = Path(PROJECT_ROOT) / ".prompts" / f"verify-feedback-{task_id}-{attempt}.md"
+        feedback_path.parent.mkdir(parents=True, exist_ok=True)
+        feedback_path.write_text(_feedback_prompt(log_path, exit_code, attempt, max_attempts, verify_cmd))
+
+        from ..core.session import load_session
+
+        session_id = load_session(writer_info.key.upper(), task_id)
+        if not session_id:
+            print_warning(f"No session to resume for {writer_info.label}; aborting verify loop")
+            break
+
+        await send_feedback_async(
+            task_id=task_id,
+            feedback_file=feedback_path,
+            session_id=session_id,
+            worktree=worktree,
+            writer_name=writer_info.name,
+            writer_model=writer_info.model,
+            writer_label=writer_info.label,
+            writer_key=writer_info.key,
+            writer_color=writer_info.color,
+        )
+        # send_feedback_async commits any writer changes (PR #178).
+
+    return VerifyResult(ok=False, attempts=max_attempts, last_log_path=last_log_path, last_exit_code=last_exit_code)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return the actual number of verify attempts performed.

If the loop exits early (e.g., Line 154 no session to resume), Line 170 still reports attempts=max_attempts, which misreports execution state.

💡 Suggested fix

- for attempt in range(1, max_attempts + 1): + attempts_run = 0 + for attempt in range(1, max_attempts + 1): + attempts_run = attempt log_path = _logs_dir() / f"{task_id}-attempt-{attempt}.log" @@ - break + break @@ - break + break @@ - return VerifyResult(ok=False, attempts=max_attempts, last_log_path=last_log_path, last_exit_code=last_exit_code) + return VerifyResult( + ok=False, + attempts=attempts_run, + last_log_path=last_log_path, + last_exit_code=last_exit_code, + )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cube/automation/verify.py` around lines 124 - 170, The final return currently hardcodes attempts=max_attempts which misreports when the loop exits early; change the final VerifyResult return to use the actual last attempt count (e.g., attempts=attempt) and ensure attempt is referenced from the for-loop (or fall back to 0 if not set) so that VerifyResult(ok=False, attempts=attempt, last_log_path=last_log_path, last_exit_code=last_exit_code) reflects the real number of verify attempts; reference the for-loop variable "attempt" and the VerifyResult construction to locate where to update.

coderabbitai · 2026-05-19T22:29:28Z

+    for wkey in writer_keys:
+        wconf = get_writer_config(wkey)
+        worktree = WORKTREE_BASE / project_name / f"writer-{wconf.name}-{ctx.task_id}"
+        if not worktree.exists():
+            continue
+        writers.append(


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently bypass verify when writer worktrees are missing.

Lines 92-94 continue quietly skips verification, and the phase still reports success. If worktree resolution regresses, verify can be effectively disabled with no hard signal.

💡 Suggested fix

- writers: list[WriterInfo] = [] + writers: list[WriterInfo] = [] + missing_worktrees: list[str] = [] @@ - if not worktree.exists(): - continue + if not worktree.exists(): + missing_worktrees.append(f"{wconf.label}: {worktree}") + continue @@ + for item in missing_worktrees: + print_warning(f"Verify skipped for missing worktree: {item}") + + if not writers: + print_error("Verify is configured but no writer worktrees were found.") + raise typer.Exit(1)

Also applies to: 119-124

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cube/commands/orchestrate/handlers.py` around lines 89 - 94, The loop over writer_keys in handlers.py currently uses "continue" when a writer worktree (computed via WORKTREE_BASE / project_name / f\"writer-{wconf.name}-{ctx.task_id}\") is missing, which silently bypasses verification; change this so missing worktrees cause a hard failure instead of quietly skipping: when get_writer_config(wkey) yields wconf but worktree.exists() is False, log an error including wkey and wconf.name (use the existing logger), and either raise a clear exception or record the failure in the verification result so the phase returns non-success (do not use continue). Apply the same behavior to the analogous block handling lines 119-124 so missing writer worktrees consistently fail verify rather than being ignored.

coderabbitai · 2026-05-19T22:29:28Z

+### Do NOT self-verify — cube runs the deterministic verify gate
+Cube runs the repo's verify command (typecheck + lint + tests) automatically
+after the writer commits. **Writers must NOT run `pnpm verify` / `task verify` /
+`npm test` / `pytest` / lint themselves.** Reasons:
+- Cube's run is authoritative; writer churn on the same commands is wasted tokens.
+- If verify fails, cube re-resumes the writer with a pointer to the log on disk
+  (writer uses `Read` to inspect, fixes, commits — cube runs verify again).
+- Judges only see code that passes verify (or has hit the retry cap), so the
+  panel never burns tokens grading "this won't build" findings.
+
+**Include this in the writer prompt** as an explicit instruction:
+"Focus on the code change. Don't run tests / typecheck / lint — cube runs them
+deterministically after you exit. If verify fails, cube will resume you with
+the log path; Read it and fix."


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make verify-gate instructions conditional on config.

This block assumes verify is always enabled, but verify.cmd can be unset/empty. In that case, telling writers “do NOT self-verify” is incorrect and can let unverified changes through.

Suggested patch

async def generate_writer_prompt(task_id: str, task_content: str, prompts_dir: Path) -> Path: @@ - prompt = f"""# Task: Generate Writer Prompt + from ...automation.verify import is_verify_configured + from ...core.user_config import load_config + + verify_enabled = is_verify_configured(load_config()) + verify_instructions = ( + """### Do NOT self-verify — cube runs the deterministic verify gate +Cube runs the repo's verify command (typecheck + lint + tests) automatically +after the writer commits. **Writers must NOT run `pnpm verify` / `task verify` / +`npm test` / `pytest` / lint themselves.** Reasons: +- Cube's run is authoritative; writer churn on the same commands is wasted tokens. +- If verify fails, cube re-resumes the writer with a pointer to the log on disk + (writer uses `Read` to inspect, fixes, commits — cube runs verify again). +- Judges only see code that passes verify (or has hit the retry cap), so the + panel never burns tokens grading "this won't build" findings. + +**Include this in the writer prompt** as an explicit instruction: +"Focus on the code change. Don't run tests / typecheck / lint — cube runs them +deterministically after you exit. If verify fails, cube will resume you with +the log path; Read it and fix." +""" + if verify_enabled + else """### Verification responsibility +Verify gate is not configured for this repo run. Writers must run the project's +verification checks (tests/lint/typecheck) before commit/push.""" + ) + + prompt = f"""# Task: Generate Writer Prompt @@ -### Do NOT self-verify — cube runs the deterministic verify gate -Cube runs the repo's verify command (typecheck + lint + tests) automatically -after the writer commits. **Writers must NOT run `pnpm verify` / `task verify` / -`npm test` / `pytest` / lint themselves.** Reasons: -- Cube's run is authoritative; writer churn on the same commands is wasted tokens. -- If verify fails, cube re-resumes the writer with a pointer to the log on disk - (writer uses `Read` to inspect, fixes, commits — cube runs verify again). -- Judges only see code that passes verify (or has hit the retry cap), so the - panel never burns tokens grading "this won't build" findings. - -**Include this in the writer prompt** as an explicit instruction: -"Focus on the code change. Don't run tests / typecheck / lint — cube runs them -deterministically after you exit. If verify fails, cube will resume you with -the log path; Read it and fix." +{verify_instructions}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cube/commands/orchestrate/prompts.py` around lines 46 - 59, Update the writer prompt generation to only include the "do NOT self-verify" paragraph when the repo has a verify command configured (i.e., verify.cmd is set/non-empty); detect the verify setting and conditionally append the string "Focus on the code change. Don't run tests / typecheck / lint — cube runs them deterministically after you exit. If verify fails, cube will resume you with the log path; Read it and fix." to the prompt output in python/cube/commands/orchestrate/prompts.py instead of unconditionally embedding it, referencing the verify configuration key (verify.cmd) when constructing the prompt.

coderabbitai · 2026-05-19T22:29:28Z

+    verify_raw = data.get("verify") or {}
+    verify_cfg = VerifyConfig(
+        cmd=str(verify_raw.get("cmd", "")).strip(),
+        timeout_seconds=int(verify_raw.get("timeout_seconds", 600)),
+        max_attempts=int(verify_raw.get("max_attempts", 3)),
+    )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden verify config parsing against malformed YAML values.

Line 246 and Lines 249-250 assume valid types (dict + int-coercible values). A config like verify: true or timeout_seconds: "fast" will throw and abort load_config().

💡 Suggested fix

- verify_raw = data.get("verify") or {} - verify_cfg = VerifyConfig( - cmd=str(verify_raw.get("cmd", "")).strip(), - timeout_seconds=int(verify_raw.get("timeout_seconds", 600)), - max_attempts=int(verify_raw.get("max_attempts", 3)), - ) + verify_raw = data.get("verify") + if not isinstance(verify_raw, dict): + verify_raw = {} + + def _as_int(value: object, default: int, *, minimum: int) -> int: + try: + parsed = int(value) + except (TypeError, ValueError): + return default + return max(parsed, minimum) + + verify_cfg = VerifyConfig( + cmd=str(verify_raw.get("cmd", "")).strip(), + timeout_seconds=_as_int(verify_raw.get("timeout_seconds", 600), 600, minimum=1), + max_attempts=_as_int(verify_raw.get("max_attempts", 3), 3, minimum=1), + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

verify_raw = data.get("verify") or {}

verify_cfg = VerifyConfig(

cmd=str(verify_raw.get("cmd", "")).strip(),

timeout_seconds=int(verify_raw.get("timeout_seconds", 600)),

max_attempts=int(verify_raw.get("max_attempts", 3)),

)

verify_raw = data.get("verify")

if not isinstance(verify_raw, dict):

verify_raw = {}

def _as_int(value: object, default: int, *, minimum: int) -> int:

try:

parsed = int(value)

except (TypeError, ValueError):

return default

return max(parsed, minimum)

verify_cfg = VerifyConfig(

cmd=str(verify_raw.get("cmd", "")).strip(),

timeout_seconds=_as_int(verify_raw.get("timeout_seconds", 600), 600, minimum=1),

max_attempts=_as_int(verify_raw.get("max_attempts", 3), 3, minimum=1),

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cube/core/user_config.py` around lines 246 - 251, The parsing for verify config in load_config() assumes verify_raw is a dict and that timeout_seconds/max_attempts are int-coercible, which will raise for malformed YAML like "verify: true" or "timeout_seconds: 'fast'"; update the block that builds verify_raw and verify_cfg (symbols: verify_raw, VerifyConfig, verify_cfg, load_config) to first ensure verify_raw is a mapping (fall back to {} if not), extract cmd using str(...) but only if present, and safely parse timeout_seconds and max_attempts by attempting int() in a try/except (or using conditional isinstance checks) falling back to the existing defaults (600 and 3) when parsing fails or values are missing; keep the creation of VerifyConfig but feed it these validated/coerced values so malformed types do not abort load_config().

) (#184) PR #182 and #183 squash-merged to empty diffs because they were stacked off the verify-gate branch instead of main. Re-applying both fixes against the actual main HEAD: 1. judge_panel.run_judge: use judge_info.review_worktree as the cwd when the PR review flow synced one. Stops judges from inheriting the operator's Claude Code session worktree (which may be on a stale unrelated branch). Previously: judges reviewed code from cool-satoshi worktree on commits dozens behind the actual PR. 2. config._find_git_root: use 'git rev-parse --git-common-dir' to find the MAIN repo working tree, not the per-Claude-Code-session worktree. All worktrees of the same repo now share .agent-sessions/, .prompts/decisions/, .cube/. Eliminates the 'No session found' regression that came from state being scattered across worktrees. Verified: PROJECT_ROOT resolves to the main repo root from both the main checkout and any Claude Code session worktree. 237 tests pass. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… canonical path (#187) Codex judges run with --sandbox workspace-write and --cd <pr-N worktree> (PR #182). The sandbox blocks writes anywhere outside the cwd workspace — including the main repo's .prompts/decisions/ where decision JSONs MUST land. Result: gpt-5.5 judges (Backend, Frontend & UX) silently failed to write their decisions; only opus/claude judges (which use a different sandbox) actually persisted. Pass --add-dir <project_root> to codex when worktree differs from PROJECT_ROOT. Sandbox stays in place; just adds the main repo to the writeable allowlist so judges can write the decision file at its canonical absolute path. find_decision_file's worktree-scan fallback still acts as a safety net for any judge that writes to the worktree's .prompts/decisions/ instead. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

jacsamell merged commit 1b12a10 into main May 19, 2026
0 of 4 checks passed

jacsamell deleted the fix/judge-cwd-uses-review-worktree branch May 19, 2026 22:30

jacsamell mentioned this pull request May 19, 2026

fix: judge cwd + shared cube state across worktrees (rebase of #182+#183) #184

Merged

coderabbitai Bot mentioned this pull request May 19, 2026

feat(verify): run install in parallel with prompter, off the critical path #186

Merged

jacsamell mentioned this pull request May 19, 2026

fix(codex): grant write access to main repo so decision files land in canonical path #187

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(panel): judge cwd = PR review worktree, not operator's CWD#182

fix(panel): judge cwd = PR review worktree, not operator's CWD#182
jacsamell merged 1 commit into
mainfrom
fix/judge-cwd-uses-review-worktree

jacsamell commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Pre-merge checks failed

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    verify_raw = data.get("verify") or {}
-    verify_cfg = VerifyConfig(
-        cmd=str(verify_raw.get("cmd", "")).strip(),
-        timeout_seconds=int(verify_raw.get("timeout_seconds", 600)),
-        max_attempts=int(verify_raw.get("max_attempts", 3)),
-    )
+    verify_raw = data.get("verify")
+    if not isinstance(verify_raw, dict):
+        verify_raw = {}
+    def _as_int(value: object, default: int, *, minimum: int) -> int:
+        try:
+            parsed = int(value)
+        except (TypeError, ValueError):
+            return default
+        return max(parsed, minimum)
+    verify_cfg = VerifyConfig(
+        cmd=str(verify_raw.get("cmd", "")).strip(),
+        timeout_seconds=_as_int(verify_raw.get("timeout_seconds", 600), 600, minimum=1),
+        max_attempts=_as_int(verify_raw.get("max_attempts", 3), 3, minimum=1),
+    )

Conversation

jacsamell commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Symptom

Root cause

Fix

Test plan

Deterministic Verify Gate Between Writer and Judge Phases

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks failed

❌ Failed checks (1 error)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jacsamell commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading