Code Review Engineer reviews are shallow — the skill surface doesn't expose the diff

## Problem

The four-phase Code Review Engineer epic (#10) shipped real plumbing — template-driven container, enforced action policy, persistent memory, autonomous PR-watcher. The first real autonomous review (PR #29, smoke-tested 2026-05-24) confirmed the **pipeline** works end-to-end, but the **content** of the review was shallow because Kimi is writing reviews with effectively no knowledge of the code.

When the watcher dispatches a review task, the LLM's context is:

| Source | Content |
|---|---|
| Instruction | *"PR #N at owner/repo is open, use the github-pr-review skill"* |
| \`role_context.memory\` | empty \`{}\` for a fresh agent |
| SOUL.md | The role template's system_prompt (6 bullets) |
| Available skills | \`github-list-prs\`, \`github-pr-review\`, \`update-memory\` |

That's it. **No diff, no file contents, no PR title/description, no issue history, no past PRs, no repo conventions.** The \`github-pr-review/SKILL.md\` says *"Read the PR diff carefully before reviewing"* but provides no endpoint to actually fetch it.

The \`code-review-engineer.yaml\` template lists \`github.pr.files\` and \`github.repo.read\` in \`allowed_actions\`, but there's no gateway endpoint backing them and no skill that tells the LLM to call them. Capability surface ≠ exposed surface.

Result: a plausibly-worded "review" generated from system prompt + skill docs alone. Not useful to an average skilled engineer.

## Proposal — three depth tiers

### Tier 1 (cheap, recommended first)

**Pipe in the PR diff so the review is grounded in the actual code.**

- Add \`GatewayService.get_pull_request_files(user_id, owner, repo, pull_number)\` → \`GET /repos/{}/{}/pulls/{}/files\` (GitHub returns the per-file patch).
- Add a corresponding gateway endpoint behind \`require_action("github.pr.files")\`.
- New \`github-pr-fetch\` skill that the SKILL teaches the LLM to call **before** \`github-pr-review\`. Updated \`github-pr-review/SKILL.md\` to chain them.
- Optionally also expose \`get_pull_request\` so the LLM sees the PR title + description + labels.

This alone takes reviews from "imaginary" to "grounded in the actual diff." Probably the single highest-leverage change for review quality. Half-day of work.

### Tier 2 (medium, follow-up)

**Pipe in past PRs / issues / discussions.**

GitHub's search API can return prior reviewer discussions, related issues, prior PRs touching the same files. Useful when memory carries *"we already discussed this convention in #42"*. Adds tokens fast; needs care about scope (last N, same author, same files).

### Tier 3 (heavy, beyond hackathon)

**Full codebase context** — either clone the repo into the container (disk + tokens), or set up embeddings/retrieval. The right path here is probably eventually OpenClaw's own filesystem/repo tools, not something we build into the gateway. Out of scope for now.

## Why this wasn't part of the original epic

The four phases prove the *plumbing* (container shape, trust moat, memory, autonomy). The *substance* depends on what the skill surface exposes. We intentionally kept skills minimal to ship the epic, but the result is that the agent is plumbed correctly to do a job it can't actually do well yet.

## Done when

- The Code Review Engineer can read the PR diff before submitting a review.
- A test review on a real PR produces feedback that references specific files / lines / changes — not a generic "looks good, consider X."
- \`agent_action_log\` shows the new \`github.pr.files\` action being called before \`github.review.submit\`.

## Out of scope

- Memory compaction (#23)
- Other roles to A/B/D parity (#17)
- Frontend UI for any of this (#13)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Review Engineer reviews are shallow — the skill surface doesn't expose the diff #33

Problem

Proposal — three depth tiers

Tier 1 (cheap, recommended first)

Tier 2 (medium, follow-up)

Tier 3 (heavy, beyond hackathon)

Why this wasn't part of the original epic

Done when

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Source	Content
Instruction	"PR #N at owner/repo is open, use the github-pr-review skill"
`role_context.memory`	empty `{}` for a fresh agent
SOUL.md	The role template's system_prompt (6 bullets)
Available skills	`github-list-prs`, `github-pr-review`, `update-memory`

Code Review Engineer reviews are shallow — the skill surface doesn't expose the diff #33

Description

Problem

Proposal — three depth tiers

Tier 1 (cheap, recommended first)

Tier 2 (medium, follow-up)

Tier 3 (heavy, beyond hackathon)

Why this wasn't part of the original epic

Done when

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions