Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ Anything else here (workflows under `.github/workflows/`, scripts, tests) exists
│ │ ├── index.ts # CLI entry → bundled to dist/filter-diff.js
│ │ ├── filter-diff.ts # Core filterDiff() pure function + applyFilter() I/O wrapper.
│ │ └── __tests__/
│ ├── score-confidence/ # Per-finding confidence scoring for the PR review pipeline.
│ │ ├── index.ts # CLI entry → bundled to dist/score-confidence.js
│ │ ├── score-confidence.ts # Core scoreFinding()/scoreFindings() pure functions + posting policy.
│ │ │ # Source of truth for the model mirrored in pr-review.yaml.
│ │ └── __tests__/
│ ├── score-risk/ # Per-file risk scoring for the PR review pipeline.
│ │ ├── index.ts # CLI entry → bundled to dist/score-risk.js
│ │ ├── score-risk.ts # Core scoreFiles() pure function.
Expand Down Expand Up @@ -167,6 +172,7 @@ The action runs untrusted input (PR titles, bodies, comments, diffs) through an
- `pull_request` action `review_requested` when `github.event.requested_reviewer.login == 'docker-agent'`
- `@docker-agent` mentions on PR/issue comments — these run the `.github/actions/mention-reply` handler (sets `should-reply` and builds the context prompt) and then the `review-pr/mention-reply` sub-action (referenced from a pinned SHA, not present as a local path on every commit). The `pr-review-mention-reply.yaml` agent handles the actual reply.
- Diffs over 1500 lines are **chunked at file boundaries** in `review-pr/action.yml` (see "Split diff into chunks"). Per-file **risk scoring** (security paths, line counts, error-handling patterns) prioritizes verifier attention.
- Per-finding **confidence scoring** assigns each verified finding a precise 0–100 score (band: strong/moderate/weak/negligible) from the verifier's `verdict`, `evidence_strength`, and `context_completeness`, plus drafter↔verifier severity concordance and scope. `src/score-confidence/score-confidence.ts` is the **single source of truth** for the model (weights, bands, threshold, posting policy); the "Confidence Scoring" section of `review-pr/agents/pr-review.yaml` mirrors it as a strict lookup table so the orchestrator can apply it inline (the gitignored `dist/` is not available at agent runtime). Change one, change both — the unit tests pin every value. Security and high-severity CONFIRMED/LIKELY findings are always posted regardless of score; weak-band findings are surfaced in a summary rather than silently dropped.
- Stale review threads on lines no longer in the diff are auto-resolved via GraphQL `resolveReviewThread`. Threads with no `<!-- docker-agent-review -->` marker are never touched.

### Workflows (`.github/workflows/`)
Expand Down
14 changes: 13 additions & 1 deletion review-pr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,8 @@ but the error check happens after this line accesses `user.ID`.

Consider moving the nil check before accessing user properties.

confidence: strong (92/100)

<!-- docker-agent-review -->
```

Expand All @@ -298,9 +300,19 @@ When no issues are found:
### Review Pipeline

```
AGENTS.md + PR Diff → Drafter (hypotheses) → Verifier (confirm) → Post Comments
AGENTS.md + PR Diff → Drafter (hypotheses) → Verifier (confirm + evidence signals)
→ Confidence score (0–100) → Post Comments
```

Each verified finding gets a precise **confidence score** (0–100) and a band
(strong / moderate / weak / negligible), computed deterministically from the
verifier's verdict, evidence strength, and context completeness, plus the
drafter↔verifier severity agreement. High-confidence findings are posted as
inline comments (labelled with their confidence); lower-confidence findings are
listed separately rather than dropped. Security and high-severity findings are
always surfaced regardless of score. The model is implemented and unit-tested in
[`src/score-confidence/`](../src/score-confidence/score-confidence.ts).

### Learning System

When you reply to a review comment:
Expand Down
29 changes: 29 additions & 0 deletions review-pr/agents/evals/confidence-scoring-1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"id": "f0c1e2d3-4a5b-6c7d-8e9f-0a1b2c3d4e5f",
"title": "Confidence scoring - per-finding score, band, and security floor (run 1)",
"evals": {
"setup": "apk add --no-cache github-cli",
"relevance": [
"The agent ran 'echo $GITHUB_ACTIONS' before performing the review to detect the output mode",
"The agent output the review to the console as formatted markdown instead of posting via gh api",
"The drafter response is valid JSON containing a 'findings' array, a 'summary' field, and 'review_complete' set to true",
"At least one finding flags the missing redirect_uri validation as a security concern",
"The verifier returned a JSON response with a 'verdicts' array containing one verdict per finding, and each verdict includes an 'evidence_strength' value (direct, circumstantial, or speculative) and a 'context_completeness' value (full, partial, or none)",
"Each finding posted in the console output is labelled with a confidence band (strong, moderate, weak, or negligible) and a numeric score out of 100",
"The security finding about redirect_uri validation is surfaced in the review regardless of its confidence score (security findings are never auto-suppressed)",
"The review assessment label is '🔴 CRITICAL' or '🟡 NEEDS ATTENTION' because there is at least one confirmed or likely security/high-severity finding"
]
},
"messages": [
{
"message": {
"agentName": "",
"message": {
"role": "user",
"content": "Review the following PR.\n\n## PR Information\n- **Title**: Add optional redirect URI to OAuth authorization flow\n- **Author**: jeanlaurent\n- **Branch**: custom-redirect-url → main\n- **Files Changed**: 6\n\n## PR Description\nAdds an optional redirect_uri field to GetAuthorizationURLRequest so callers can override the default OAuth callback URL. This allows apps to use custom URI schemes (e.g., myapp://auth/callback) for the OIDC login flow.\n\n### Changes\n- proto: Added optional redirect_uri field to GetAuthorizationURLRequest\n- auth/oidc: AuthorizationURL() accepts a redirectURI parameter, falls back to configured default when empty\n- auth/service: Reads redirect_uri from the request and passes it through\n- generated code: Regenerated Go and TypeScript protobuf files\n\n## Diff\n\nNote: Generated protobuf files (auth.pb.go, auth_pb.ts) are omitted — only hand-written code is shown.\n\n```diff\ndiff --git a/api/auth/v1/auth.proto b/api/auth/v1/auth.proto\nindex df6bf369..54dfc78a 100644\n--- a/api/auth/v1/auth.proto\n+++ b/api/auth/v1/auth.proto\n@@ -25,6 +25,11 @@ message GetAuthorizationURLRequest {\n // Optional state parameter for CSRF protection.\n // If not provided, the server will generate one.\n optional string state = 1;\n+\n+ // Optional redirect URI for the OAuth callback.\n+ // If not provided, the server will use the configured default redirect URI.\n+ // This allows mobile apps to use custom URI schemes (e.g., myapp://auth/callback).\n+ optional string redirect_uri = 2;\n }\n \n // GetAuthorizationURLResponse is the response message containing the authorization URL.\n@@ -53,6 +58,10 @@ message GetLogoutURLResponse {\n message ExchangeTokenRequest {\n // The authorization code received from the OIDC provider.\n string code = 1;\n+\n+ // Optional redirect URI that was used in the authorization request.\n+ // Must match the redirect_uri used in GetAuthorizationURL for the OAuth flow to succeed.\n+ optional string redirect_uri = 2;\n }\n \ndiff --git a/backend/internal/platformd/auth/oidc.go b/backend/internal/platformd/auth/oidc.go\nindex 0e14ad7e..c4c96499 100644\n--- a/backend/internal/platformd/auth/oidc.go\n+++ b/backend/internal/platformd/auth/oidc.go\n@@ -65,9 +65,14 @@ func NewOIDCClient(ctx context.Context, cfg *Config) (*OIDCClient, error) {\n }\n \n // AuthorizationURL builds the authorization URL for the OIDC login flow.\n-func (c *OIDCClient) AuthorizationURL(state string) string {\n+// If redirectURI is provided, it will be used instead of the configured default.\n+func (c *OIDCClient) AuthorizationURL(state string, redirectURI string) string {\n \tcfg := c.oauth2Config\n-\tcfg.RedirectURL = c.redirectURI\n+\tif redirectURI != \"\" {\n+\t\tcfg.RedirectURL = redirectURI\n+\t} else {\n+\t\tcfg.RedirectURL = c.redirectURI\n+\t}\n \treturn cfg.AuthCodeURL(state)\n }\n \n@@ -92,10 +97,16 @@ type TokenResponse struct {\n }\n \n // ExchangeCode exchanges an authorization code for tokens.\n-func (c *OIDCClient) ExchangeCode(ctx context.Context, code string) (*TokenResponse, error) {\n+// If redirectURI is provided, it will be used instead of the configured default.\n+// The redirect URI must match the one used in the authorization request.\n+func (c *OIDCClient) ExchangeCode(ctx context.Context, code string, redirectURI string) (*TokenResponse, error) {\n \t// Set the redirect URI for this specific exchange\n \tcfg := c.oauth2Config\n-\tcfg.RedirectURL = c.redirectURI\n+\tif redirectURI != \"\" {\n+\t\tcfg.RedirectURL = redirectURI\n+\t} else {\n+\t\tcfg.RedirectURL = c.redirectURI\n+\t}\n \n \ttoken, err := cfg.Exchange(ctx, code)\n \tif err != nil {\n\ndiff --git a/backend/internal/platformd/auth/service.go b/backend/internal/platformd/auth/service.go\nindex c2a95279..e3e355c9 100644\n--- a/backend/internal/platformd/auth/service.go\n+++ b/backend/internal/platformd/auth/service.go\n@@ -82,8 +82,11 @@ func (s *Service) GetAuthorizationURL(\n \t\t}\n \t}\n \n-\t// Build the authorization URL using the configured redirect URI\n-\tauthURL := s.oidcClient.AuthorizationURL(state)\n+\t// Get redirect URI from request, or use configured default\n+\tredirectURI := msg.GetRedirectUri()\n+\n+\t// Build the authorization URL\n+\tauthURL := s.oidcClient.AuthorizationURL(state, redirectURI)\n \n \treturn connect.NewResponse(&authv1.GetAuthorizationURLResponse{\n \t\tAuthorizationUrl: authURL,\n@@ -138,8 +141,11 @@ func (s *Service) ExchangeToken(\n \t\treturn nil, connect.NewError(connect.CodeInvalidArgument, ErrCodeRequired)\n \t}\n \n-\t// Exchange the code for Docker tokens using the configured redirect URI\n-\ttokenResp, err := s.oidcClient.ExchangeCode(ctx, code)\n+\t// Get redirect URI from request, or use configured default\n+\tredirectURI := msg.GetRedirectUri()\n+\n+\t// Exchange the code for Docker tokens\n+\ttokenResp, err := s.oidcClient.ExchangeCode(ctx, code, redirectURI)\n \tif err != nil {\n \t\tif errors.Is(err, ErrTokenExchange) {\n \t\t\treturn nil, connect.NewError(connect.CodeInvalidArgument, err)\n```",
"created_at": "2026-02-18T14:00:00-05:00"
}
}
}
]
}
Loading