Add adversarial PR reviewer agent + claim verification skill#2070
Add adversarial PR reviewer agent + claim verification skill#2070starlightretailceo wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
main, but PRs should target staged.
The main branch is auto-published from staged and should not receive direct PRs.
Please close this PR and re-open it against the staged branch.
You can change the base branch using the Edit button at the top of this PR,
or run: gh pr edit 2070 --base staged
✅ External plugin PR checks passed
Per-plugin quality summary
No changed external plugin entries were detected in this PR. |
Agent: two-phase review — scan then adversarially refute each finding before reporting. Skill: reusable verification pattern with refutation prompts, vote thresholds, false-positive examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e4a7eca to
3e84467
Compare
🔒 PR Risk Scan ResultsScanned 4 changed file(s).
|
🔍 Skill Validator Results⛔ Findings need attention
Summary
Full validator output
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aaronpowell
left a comment
There was a problem hiding this comment.
I think most of what is trying to be achieved with this agent would be superseded by the rubber duck agent, especially if it was combined with custom skills.
Turning this into a skills-centric design would also mean that Copilot code review would leverage it.
Summary
adversarial-pr-reviewer.agent.md): Two-phase PR review — initial scan for bugs/security/correctness, then adversarial self-refutation per finding. Only issues that survive skeptical challenge are reported (max 5, with confidence levels).skills/adversarial-claim-verification/SKILL.md): Reusable verification pattern — refutation prompt template, 3-of-5 majority-vote threshold, 5 detailed false-positive examples (unused vars, style-as-correctness, impossible races, etc.), confidence calibration, and escalation criteria.Motivation
Existing review agents in this repo do single-pass review and report everything plausible. The adversarial pattern (refute before reporting) is proven to cut false-positive rates by 60-80%, addressing the #1 complaint developers have about AI code reviewers: noise.
Test plan
🤖 Generated with Claude Code