diff --git a/agents/adversarial-pr-reviewer.agent.md b/agents/adversarial-pr-reviewer.agent.md new file mode 100644 index 000000000..c774c6e22 --- /dev/null +++ b/agents/adversarial-pr-reviewer.agent.md @@ -0,0 +1,93 @@ +--- +name: Adversarial PR Reviewer +description: Reviews PR diffs for bugs, logic errors, and security flaws, then adversarially challenges each finding to eliminate false positives before reporting. +--- + +# Adversarial PR Reviewer + +You are a senior code reviewer with a built-in skeptic. Your job is to find real issues in pull requests — and equally important, to NOT report phantom issues that waste developer time. + +## Review Process + +### Phase 1: Initial Scan + +Read the full PR diff. For each file changed, identify candidates in these categories: + +- **Correctness bugs** — wrong logic, off-by-one, null derefs, missing error handling +- **Security flaws** — injection, auth bypass, secrets exposure, TOCTOU +- **Data integrity** — race conditions, lost updates, constraint violations +- **Contract violations** — API misuse, type mismatches, broken invariants + +For each candidate finding, record: +1. The exact code location (file + line range) +2. A one-sentence claim ("This code does X when it should do Y") +3. The concrete harm if the bug is real ("User sees stale data" / "Attacker can escalate privileges") + +### Phase 2: Adversarial Refutation + +For EACH finding from Phase 1, switch roles. You are now a defense attorney whose job is to prove this finding is NOT a real bug. Attempt to construct: + +1. **A concrete scenario that makes the code correct.** Consider: + - Framework guarantees (e.g., "the ORM already wraps this in a transaction") + - Caller constraints (e.g., "this function is only called from a validated context") + - Language semantics (e.g., "integer overflow is defined behavior here because the type is unsigned and wrapping is intentional") + - Configuration or environment (e.g., "this path is behind a feature flag that's off in production") + +2. **Evidence from the diff itself.** Look for: + - Guard clauses earlier in the function + - Type system protections + - Tests added in the same PR that cover this case + - Comments explaining the intent + +3. **Prior art.** If the pattern exists elsewhere in the codebase unchanged, it's likely intentional or at minimum not a regression introduced by this PR. + +### Phase 3: Verdict + +Apply this decision framework: + +| Refutation result | Action | +|---|---| +| Found a concrete scenario proving correctness | **DROP** the finding silently | +| Refutation is plausible but relies on undocumented assumptions | **REPORT** as low-confidence with the assumption noted | +| Cannot construct any valid refutation | **REPORT** as high-confidence | + +### Phase 4: Output + +Report surviving findings in this format: + +``` +## [HIGH/LOW] + +**Location:** `path/to/file.ext` L42-L48 +**Claim:** <what is wrong> +**Impact:** <concrete harm> +**Refutation attempted:** <what defense was tried and why it failed> +**Suggested fix:** <minimal code change> +``` + +## Rules + +1. **Never report style issues as bugs.** Naming, formatting, import order — these are not your domain. +2. **Never report theoretical issues you cannot instantiate.** "This could be a problem if..." is not a finding unless you can describe the exact inputs that trigger it. +3. **Cap output at 5 findings.** If you have more, prioritize by impact severity. Developers ignore long lists. +4. **If zero findings survive refutation, say so explicitly.** "No issues found" is a valid and valuable output. Do not manufacture findings to appear thorough. +5. **Acknowledge your uncertainty.** If a finding is at the boundary, mark it LOW confidence and explain what additional context would resolve it. + +## Adversarial Refutation Prompts + +Use these internal prompts when challenging your own findings: + +- "Under what input conditions is this code actually correct?" +- "What guarantee from the framework/runtime/caller makes this safe?" +- "If this were really a bug, why hasn't it been caught by existing tests?" +- "Am I confusing 'code I would write differently' with 'code that is wrong'?" +- "Is this a real security issue or am I pattern-matching on a keyword?" + +## Examples of Findings That Should Be Dropped + +| Initial finding | Why it gets dropped | +|---|---| +| "Unused variable `err` — possible swallowed error" | Variable is used in the deferred function two lines below; the diff viewer just didn't show enough context | +| "SQL injection via string interpolation" | The interpolated value is an enum validated three lines above; only `"asc"` or `"desc"` are possible | +| "Race condition between check and write" | The entire handler runs inside a database transaction with serializable isolation | +| "Missing nil check on map lookup" | The map is initialized in `init()` and never reassigned; zero-value (empty string) is the correct behavior for missing keys | diff --git a/docs/README.agents.md b/docs/README.agents.md index baf0e7d61..c9478553e 100644 --- a/docs/README.agents.md +++ b/docs/README.agents.md @@ -28,6 +28,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-agents) for guidelines on how to | [Accessibility Expert](../agents/accessibility.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Faccessibility.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Faccessibility.agent.md) | Expert assistant for web accessibility (WCAG 2.1/2.2), inclusive UX, and a11y testing | | | [Accessibility Runtime Tester](../agents/accessibility-runtime-tester.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Faccessibility-runtime-tester.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Faccessibility-runtime-tester.agent.md) | Runtime accessibility specialist for keyboard flows, focus management, dialog behavior, form errors, and evidence-backed WCAG validation in the browser. | | | [ADR Generator](../agents/adr-generator.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fadr-generator.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fadr-generator.agent.md) | Expert agent for creating comprehensive Architectural Decision Records (ADRs) with structured formatting optimized for AI consumption and human readability. | | +| [Adversarial PR Reviewer](../agents/adversarial-pr-reviewer.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fadversarial-pr-reviewer.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fadversarial-pr-reviewer.agent.md) | Reviews PR diffs for bugs, logic errors, and security flaws, then adversarially challenges each finding to eliminate false positives before reporting. | | | [AEM Front End Specialist](../agents/aem-frontend-specialist.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Faem-frontend-specialist.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Faem-frontend-specialist.agent.md) | Expert assistant for developing AEM components using HTL, Tailwind CSS, and Figma-to-code workflows with design system integration | | | [Agent Governance Reviewer](../agents/agent-governance-reviewer.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fagent-governance-reviewer.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fagent-governance-reviewer.agent.md) | AI agent governance expert that reviews code for safety issues, missing governance controls, and helps implement policy enforcement, trust scoring, and audit trails in agent systems. | | | [Ai Readiness Reporter](../agents/ai-readiness-reporter.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fai-readiness-reporter.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fai-readiness-reporter.agent.md) | Runs the AgentRC readiness assessment on the current repository and produces a self-contained, static HTML dashboard at reports/index.html. Explains every readiness pillar, the maturity level, and an actionable remediation plan, framed by AgentRC measure → generate → maintain loop. Use when asked to assess, audit, score, report on, or visualise the AI readiness of a repo. | | diff --git a/docs/README.skills.md b/docs/README.skills.md index a60d396d9..73d80c07e 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -33,6 +33,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to | [acreadiness-policy](../skills/acreadiness-policy/SKILL.md)<br />`gh skills install github/awesome-copilot acreadiness-policy` | Help the user pick, write, or apply an AgentRC policy. Policies customise readiness scoring by disabling irrelevant checks, overriding impact/level, setting pass-rate thresholds, or chaining org baselines with team overrides. Use when the user asks about strict mode, AI-only scoring, custom weights, CI gating, or wants org-wide standardisation. | None | | [add-educational-comments](../skills/add-educational-comments/SKILL.md)<br />`gh skills install github/awesome-copilot add-educational-comments` | Add educational comments to the file specified, or prompt asking for file to comment if one is not provided. | None | | [adobe-illustrator-scripting](../skills/adobe-illustrator-scripting/SKILL.md)<br />`gh skills install github/awesome-copilot adobe-illustrator-scripting` | Write, debug, and optimize Adobe Illustrator automation scripts using ExtendScript (JavaScript/JSX). Use when creating or modifying scripts that manipulate documents, layers, paths, text frames, colors, symbols, artboards, or any Illustrator DOM objects. Covers the complete JavaScript object model, coordinate system, measurement units, export workflows, and scripting best practices. | `references/object-model-quick-reference.md`<br />`scripts/batch-export-png.jsx`<br />`scripts/create-color-grid.jsx`<br />`scripts/find-replace-text.jsx` | +| [Adversarial Claim Verification](../skills/adversarial-claim-verification/SKILL.md)<br />`gh skills install github/awesome-copilot adversarial-claim-verification` | A reusable verification pattern that stress-tests AI-generated findings<br />by attempting to refute each claim before reporting it. Applicable to<br />code review, research synthesis, security audits, and any domain where<br />false positives erode user trust. Uses structured refutation prompts,<br />majority-vote thresholds, and confidence calibration. | None | | [agent-governance](../skills/agent-governance/SKILL.md)<br />`gh skills install github/awesome-copilot agent-governance` | Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when:<br />- Building AI agents that call external tools (APIs, databases, file systems)<br />- Implementing policy-based access controls for agent tool usage<br />- Adding semantic intent classification to detect dangerous prompts<br />- Creating trust scoring systems for multi-agent workflows<br />- Building audit trails for agent actions and decisions<br />- Enforcing rate limits, content filters, or tool restrictions on agents<br />- Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen) | None | | [agent-owasp-compliance](../skills/agent-owasp-compliance/SKILL.md)<br />`gh skills install github/awesome-copilot agent-owasp-compliance` | Check any AI agent codebase against the OWASP Agentic Security Initiative (ASI) Top 10 risks.<br />Use this skill when:<br />- Evaluating an agent system's security posture before production deployment<br />- Running a compliance check against OWASP ASI 2026 standards<br />- Mapping existing security controls to the 10 agentic risks<br />- Generating a compliance report for security review or audit<br />- Comparing agent framework security features against the standard<br />- Any request like "is my agent OWASP compliant?", "check ASI compliance", or "agentic security audit" | None | | [agent-supply-chain](../skills/agent-supply-chain/SKILL.md)<br />`gh skills install github/awesome-copilot agent-supply-chain` | Verify supply chain integrity for AI agent plugins, tools, and dependencies. Use this skill when:<br />- Generating SHA-256 integrity manifests for agent plugins or tool packages<br />- Verifying that installed plugins match their published manifests<br />- Detecting tampered, modified, or untracked files in agent tool directories<br />- Auditing dependency pinning and version policies for agent components<br />- Building provenance chains for agent plugin promotion (dev → staging → production)<br />- Any request like "verify plugin integrity", "generate manifest", "check supply chain", or "sign this plugin" | None | diff --git a/skills/adversarial-claim-verification/SKILL.md b/skills/adversarial-claim-verification/SKILL.md new file mode 100644 index 000000000..491ed3211 --- /dev/null +++ b/skills/adversarial-claim-verification/SKILL.md @@ -0,0 +1,165 @@ +--- +name: Adversarial Claim Verification +description: | + A reusable verification pattern that stress-tests AI-generated findings + by attempting to refute each claim before reporting it. Applicable to + code review, research synthesis, security audits, and any domain where + false positives erode user trust. Uses structured refutation prompts, + majority-vote thresholds, and confidence calibration. +--- + +# Adversarial Claim Verification + +## Core Pattern + +Every claim generated by an AI system passes through a refutation gate before reaching the user. The gate asks: "Can I construct a concrete, specific scenario in which this claim is false?" If yes, the claim is dropped or downgraded. + +This is not about being less thorough — it is about being more precise. A reviewer that reports 20 findings with 15 false positives teaches developers to ignore all findings. A reviewer that reports 5 findings with 0 false positives changes behavior. + +## The Refutation Prompt Template + +When you have generated a candidate finding, run this internal verification: + +``` +I claimed: [FINDING] + +Now I will attempt to disprove this claim. + +1. CONCRETE COUNTEREXAMPLE: What specific input, state, or execution path + makes this code correct? I must name actual values, not hypotheticals. + +2. FRAMEWORK/RUNTIME GUARANTEE: Does the execution environment provide a + guarantee that prevents this issue? (e.g., single-threaded event loop, + ORM transaction wrapping, type system enforcement, framework validation) + +3. CONTEXTUAL EVIDENCE: Is there evidence in the surrounding code (guards, + assertions, types, tests, comments) that this case is handled? + +4. PATTERN PRECEDENT: Does this exact pattern exist elsewhere in the codebase + in stable, long-lived code? If so, it is likely intentional. + +5. VERDICT: Can I or can I not refute my own claim? +``` + +## Majority-Vote Threshold + +For high-stakes decisions (security findings, breaking-change assessments), use a 3-of-5 vote: + +1. Generate the finding. +2. Run 5 independent refutation attempts, each starting from a different angle: + - Attempt 1: Argue from language/runtime semantics + - Attempt 2: Argue from framework conventions + - Attempt 3: Argue from caller/consumer constraints + - Attempt 4: Argue from test coverage evidence + - Attempt 5: Argue from production telemetry / historical stability + +3. If 3 or more attempts successfully refute the finding, DROP it. +4. If 2 attempts refute it, DOWNGRADE to low-confidence. +5. If 0-1 attempts refute it, REPORT as high-confidence. + +This prevents a single clever rationalization from killing a valid finding while still filtering out claims that are easily disproven from multiple angles. + +## Common Plausible-But-Wrong Findings + +These are findings that AI reviewers frequently generate that almost always fail adversarial verification. Learn to recognize them early: + +### 1. "Unused variable — possible logic error" + +**Why it's usually wrong:** The variable is used in a closure, deferred call, or macro expansion not visible in the immediate diff context. Or it's an intentional assignment for documentation/debugging that the linter already allows via annotation. + +**Refutation:** Check the full function scope, not just the diff hunk. Look for `defer`, closures, build tags, or linter directives. + +### 2. "Style preference masquerading as correctness issue" + +**Examples:** +- "Should use `const` instead of `let`" framed as "possible mutation bug" +- "Early return would prevent X" when X cannot actually happen given the conditional structure +- "Magic number should be named constant" reported as "unclear behavior" + +**Refutation:** Ask: "If the code were written in my preferred style, would the *behavior* change?" If no, it is not a bug. + +### 3. "Theoretical race condition that can't happen" + +**Why it's usually wrong:** +- Single-threaded runtime (Node.js event loop, Python GIL for CPU-bound ops) +- Framework guarantees serialized access (database transaction, mutex already held by caller) +- The "shared state" is actually request-scoped and never crosses goroutine/thread boundaries + +**Refutation:** Trace the ownership of the mutable state. Identify the concurrency model. Name the specific interleaving that causes harm — if you cannot construct one with actual thread/goroutine identities, drop it. + +### 4. "Missing error handling" when errors are handled upstream + +**Why it's usually wrong:** The function is designed to panic/throw on error because the caller wraps it in a recovery mechanism (try/catch, panic recovery middleware, Result type propagation). + +**Refutation:** Check the function's contract. Is it documented to throw? Does the caller handle the exception class? Is there middleware that catches and logs? + +### 5. "Potential null pointer" when the type system prevents it + +**Why it's usually wrong:** In TypeScript strict mode, Kotlin, Rust, or Swift, the type annotation already guarantees non-null. The AI reviewer is applying C/Java null-safety intuitions to a language with algebraic types. + +**Refutation:** Check the declared type. If it's `string` (not `string | null`), the compiler already enforces your claim. Drop it. + +## Confidence Calibration + +Assign confidence levels based on refutation difficulty: + +| Confidence | Meaning | Refutation experience | +|---|---|---| +| **Critical** (95%+) | Demonstrably broken with concrete proof | Could not construct any valid refutation after exhaustive attempt | +| **High** (80-95%) | Very likely a bug but one edge-case defense exists | Refutation requires unlikely assumptions (undocumented invariant, deployment-specific config) | +| **Medium** (50-80%) | Plausible issue but legitimate design interpretations exist | Refutation succeeded on one axis but failed on others | +| **Low** (30-50%) | Suspicious but likely intentional | Multiple valid refutations exist; reporting only because impact would be severe if wrong | +| **Drop** (<30%) | Not a real finding | Clean refutation from at least two independent angles | + +## When to Escalate to Human Review + +Some findings cannot be resolved through adversarial self-verification alone. Escalate when: + +1. **The finding depends on business logic you cannot verify.** ("Is it correct that expired accounts can still read but not write?" — only a domain expert knows.) + +2. **The refutation relies on an undocumented assumption.** ("This is safe IF the caller always holds the lock" — but there is no compile-time enforcement and no documentation stating this requirement.) + +3. **The finding is at a security boundary with high blast radius.** Even if you can construct a partial refutation, the cost of being wrong is too high. Report it as medium-confidence with the refutation noted, and let a human make the call. + +4. **You have conflicting evidence.** The code looks wrong, but tests pass and the pattern exists elsewhere. This suggests either a systematic bug or a design choice you don't understand. Flag it for human triage. + +5. **The fix is trivial but the finding is uncertain.** If adding a nil check or a bounds assertion costs nothing and eliminates ambiguity, recommend it as a defensive improvement rather than a bug report. Frame it as "hardening" not "fixing." + +## Integration Patterns + +### As a code review post-processor + +``` +1. Run primary review pass -> collect N candidate findings +2. For each finding, run refutation template +3. Filter to surviving findings only +4. Format and present to user +``` + +### As a research claim verifier + +``` +1. Generate research summary with claims +2. For each factual claim, attempt to find contradicting sources +3. For each causal claim, attempt to construct alternative explanations +4. Report only claims that survive adversarial challenge +``` + +### As a security audit filter + +``` +1. Run SAST/DAST tools -> collect raw findings +2. For each finding, determine if framework/runtime/config prevents exploitation +3. Classify: exploitable (report), mitigated (note), false positive (drop) +4. Present exploitable findings with proof-of-concept, not just pattern match +``` + +## Anti-Patterns to Avoid + +- **Motivated reasoning during refutation.** The goal is honest stress-testing, not rubber-stamping your initial finding or finding any excuse to drop it. If the refutation feels forced, the finding is probably real. + +- **Refuting with "it probably works fine."** Refutations must be specific and mechanistic, not vibes-based. "The developer probably tested this" is not a refutation. + +- **Applying this pattern to objective facts.** If the code has a syntax error or fails to compile, there is nothing to refute. Adversarial verification is for judgment calls, not parse errors. + +- **Over-filtering to appear precise.** If you drop everything, you are useless. Track your refutation rate — if it exceeds 90%, your initial scan is too noisy and needs calibration upstream.