feat(review-pr): add per-finding confidence scoring model by Sayt-0 · Pull Request #17 · docker/docker-agent-action

Sayt-0 · 2026-06-24T21:02:46Z

Summary

Adds a precise, multi-criteria confidence scoring model for the PR reviewer. Each verified finding gets a 0 to 100 confidence score, a band, and a posting disposition, computed deterministically from several signals rather than a subjective guess.

The model lives in two synchronized surfaces:

Surface	Role
`src/score-confidence/` (pure module + CLI + unit tests)	Single source of truth for the model
`review-pr/agents/pr-review.yaml` "Confidence Scoring" section	Strict lookup table the orchestrator applies inline (the gitignored `dist/` is not available at agent runtime)

Criteria

Criterion	Source	Role
Verdict (CONFIRMED / LIKELY / DISMISSED)	verifier	Primary agreement signal
`evidence_strength` (direct / circumstantial / speculative)	verifier (new field)	Pattern / snippet match strength
`context_completeness` (full / partial / none)	verifier (new field)	Whether the verifier saw the code it needed
Severity concordance	derived	Drafter vs verifier severity rank distance
Scope (`in_diff` and `in_changed_code`)	derived	Hard gate
Category / severity	verifier	Posting policy only (security floor, high-severity always-post)

Scale and threshold

Score 0 to 100 from a precomputed verdict x evidence x context table (CONFIRMED 70 / LIKELY 40 base; evidence +18 / +8 / -4; context +12 / +4 / -10), plus concordance (+5 / 0 / -8), clamped.
Bands: strong >= 80, moderate 55..79, weak 30..54, negligible < 30.
Default posting threshold is 55 (the moderate-band floor, so the band and the post cutoff cannot drift apart).
Only CONFIRMED can reach strong (LIKELY tops out at 75), a property the unit tests pin.

Posting policy (first match wins; the 5-comment cap is applied last)

Rule	Behavior
Out-of-scope / DISMISSED non-security	not surfaced
Security floor (CONFIRMED/LIKELY)	always inline, never auto-suppressed, cap-exempt
High-severity (CONFIRMED/LIKELY)	always inline, cap-exempt
Band strong or moderate	inline (subject to the cap)
Band weak, non-forced	lower-confidence summary list (not silently dropped)
Negligible band, verifier severity medium	summary (visibility floor)
DISMISSED security	dismissed-security audit line (human-reviewable)
Non-forced inline overflow past 5	demoted to the summary list

Maps to the requested design

Requested	Implemented
Verifier agreement score	verdict + drafter/verifier severity concordance
Pattern match strength	`evidence_strength` field
Context completeness	`context_completeness` field
Scale (high/medium/low or 0-100)	0 to 100 plus four named bands
Default threshold	55 (moderate floor)
Precise, multiple criteria	deterministic lookup table; six criteria; 84 unit tests pin every value

How it was hardened

A design review (three independent lenses: calibration, security policy, LLM reproducibility) locked the constants, replacing error-prone post-hoc caps with the lookup table and removing band dead-zones. An adversarial verification pass then confirmed and fixed three defects:

Defect	Severity	Resolution
Verifier escalating severity could silently drop a medium-severity finding (concordance is non-monotonic in severity)	major	Medium-severity visibility floor: a still-believed medium finding is kept in the summary, never dropped
`category` was the only enum not validated, so a misspelled value silently disabled the security floor	minor	Validate `category` via `assertEnum`
Severity-disagreement can move a borderline finding from inline to summary	minor	Documented as intended: the finding stays visible, and confidence legitimately reflects assessor agreement (a different axis from severity)

TS-to-prompt numeric consistency (all 18 table cells, bands, threshold, posting precedence, schema) was verified clean.

Compatibility with #15

#15 ("always post a review comment even with zero findings") and this change touch distinct regions of pr-review.yaml and are complementary: an empty inline set yields a APPROVE assessment label while the summary and audit sections still go in the review body, and the review is still posted.

Validation

Check	Result
`pnpm build`	pass
`pnpm test` (554 tests, 84 new)	pass
`tsc --noEmit`	pass
`biome ci`	pass
`actionlint`	pass

Note on runtime placement

The model is applied by the orchestrator prompt (mirroring the tested TS module) rather than invoked as a dist bundle at agent runtime, because dist/ is gitignored and not present in the agent's working tree. If invoking the compiled scorer at runtime is preferred, that is a larger pipeline change (staging the bundle plus a permission) and can be done separately.

Score each verified finding 0-100 from the verifier verdict, evidence strength, context completeness, drafter/verifier severity concordance, and scope. Bands (strong/moderate/weak/negligible) with a default posting threshold of 55 gate inline comments; security and high-severity CONFIRMED/LIKELY findings are always posted, weak-band findings go to a visible lower-confidence summary instead of being dropped, and a medium-severity floor keeps a still-believed finding visible. The model is implemented and unit-tested in src/score-confidence (single source of truth) and mirrored in the orchestrator prompt as a strict lookup table. The verifier now emits evidence_strength and context_completeness.

CodeQL js/insecure-temporary-file: the CLI defaulted its output to a hardcoded /tmp path. Default to stdout instead (composable, no fixed temp file) and write to a file only when the caller passes an explicit output path.

docker-agent

Assessment: 🟢 APPROVE

The confidence scoring model looks correct. The lookup table arithmetic, band boundaries, clamping, posting policy precedence, enum validation, and CLI wiring all check out. The TypeScript module and the YAML mirror are consistent. No bugs were found in the code introduced by this PR.

github-advanced-security AI found potential problems Jun 24, 2026

View reviewed changes

Comment thread src/score-confidence/index.ts Fixed

fix(score-confidence): write CLI output to stdout by default

85b1517

CodeQL js/insecure-temporary-file: the CLI defaulted its output to a hardcoded /tmp path. Default to stdout instead (composable, no fixed temp file) and write to a file only when the caller passes an explicit output path.

Sayt-0 enabled auto-merge (squash) June 24, 2026 21:10

Sayt-0 requested a review from derekmisler June 24, 2026 21:10

docker-agent reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(review-pr): add per-finding confidence scoring model#17

feat(review-pr): add per-finding confidence scoring model#17
Sayt-0 wants to merge 2 commits into
mainfrom
feat/confidence-scoring

Sayt-0 commented Jun 24, 2026

Uh oh!

Uh oh!

docker-agent left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Sayt-0 commented Jun 24, 2026

Summary

Criteria

Scale and threshold

Posting policy (first match wins; the 5-comment cap is applied last)

Maps to the requested design

How it was hardened

Compatibility with #15

Validation

Note on runtime placement

Uh oh!

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants