Skip to content

Code Review Engineer reviews are shallow — the skill surface doesn't expose the diff #33

@michaelzwang13

Description

@michaelzwang13

Problem

The four-phase Code Review Engineer epic (#10) shipped real plumbing — template-driven container, enforced action policy, persistent memory, autonomous PR-watcher. The first real autonomous review (PR #29, smoke-tested 2026-05-24) confirmed the pipeline works end-to-end, but the content of the review was shallow because Kimi is writing reviews with effectively no knowledge of the code.

When the watcher dispatches a review task, the LLM's context is:

Source Content
Instruction "PR #N at owner/repo is open, use the github-pr-review skill"
`role_context.memory` empty `{}` for a fresh agent
SOUL.md The role template's system_prompt (6 bullets)
Available skills `github-list-prs`, `github-pr-review`, `update-memory`

That's it. No diff, no file contents, no PR title/description, no issue history, no past PRs, no repo conventions. The `github-pr-review/SKILL.md` says "Read the PR diff carefully before reviewing" but provides no endpoint to actually fetch it.

The `code-review-engineer.yaml` template lists `github.pr.files` and `github.repo.read` in `allowed_actions`, but there's no gateway endpoint backing them and no skill that tells the LLM to call them. Capability surface ≠ exposed surface.

Result: a plausibly-worded "review" generated from system prompt + skill docs alone. Not useful to an average skilled engineer.

Proposal — three depth tiers

Tier 1 (cheap, recommended first)

Pipe in the PR diff so the review is grounded in the actual code.

  • Add `GatewayService.get_pull_request_files(user_id, owner, repo, pull_number)` → `GET /repos/{}/{}/pulls/{}/files` (GitHub returns the per-file patch).
  • Add a corresponding gateway endpoint behind `require_action("github.pr.files")`.
  • New `github-pr-fetch` skill that the SKILL teaches the LLM to call before `github-pr-review`. Updated `github-pr-review/SKILL.md` to chain them.
  • Optionally also expose `get_pull_request` so the LLM sees the PR title + description + labels.

This alone takes reviews from "imaginary" to "grounded in the actual diff." Probably the single highest-leverage change for review quality. Half-day of work.

Tier 2 (medium, follow-up)

Pipe in past PRs / issues / discussions.

GitHub's search API can return prior reviewer discussions, related issues, prior PRs touching the same files. Useful when memory carries "we already discussed this convention in #42". Adds tokens fast; needs care about scope (last N, same author, same files).

Tier 3 (heavy, beyond hackathon)

Full codebase context — either clone the repo into the container (disk + tokens), or set up embeddings/retrieval. The right path here is probably eventually OpenClaw's own filesystem/repo tools, not something we build into the gateway. Out of scope for now.

Why this wasn't part of the original epic

The four phases prove the plumbing (container shape, trust moat, memory, autonomy). The substance depends on what the skill surface exposes. We intentionally kept skills minimal to ship the epic, but the result is that the agent is plumbed correctly to do a job it can't actually do well yet.

Done when

  • The Code Review Engineer can read the PR diff before submitting a review.
  • A test review on a real PR produces feedback that references specific files / lines / changes — not a generic "looks good, consider X."
  • `agent_action_log` shows the new `github.pr.files` action being called before `github.review.submit`.

Out of scope

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions