[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-05-13 #31919
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-05-14T10:58:26.347Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
/tmp/gh-aw/prompt-cache/pr-full-data/Key Findings
Workflow Compilation(30.4%) andTesting & Test Coverage(29.7%) together account for ~60% of all Copilot tasks. Together they reflect the project's heavy reliance on the agent for lock-file recompilation and Go-package test fixes.AI Engine Integrationis the weakest cluster at 62.5% merge rate — and the most expensive (avg 93 files changed, +1020/-553 lines, 6.0 comments/PR). These are big AWF version bumps, retries, firewall steering changes — high blast radius, more revisions.Cachingis the strongest cluster at 88.1% merge rate with the smallest footprint (10.5 files, +138/-44 lines). Narrow, contained changes succeed.Cluster Overview
Cluster details & representative PRs
Cluster 6 — Workflow Compilation (304 PRs, 73.7%)
Compilation/regeneration of agentic-workflow lock files, daily-workflow scaffolding, shared imports (e.g.
observability-otlp.md), model alias updates. Highest volume, mid-low merge rate because lock-file regenerations are routinely superseded by newer commits before merge.{{#import}}with{{#runtime-import}}(merged)## agent:(merged)Cluster 3 — Testing & Test Coverage (297 PRs, 86.2%)
Go-package fixes (
pkg/workflow, engine, parsers), lint failures, error-string changes, CJS handler updates. Highest-volume "contained" cluster — clear scope, mostly merged.concurrency.queuesupport (merged)Cluster 0 — Safe Outputs (128 PRs, 85.9%)
Work on the safe-output surface:
create_issue,create_pull_request,push-to-pull-request-branch, label-triggered jobs, permission gating. Narrow scope, high merge rate.aw_contextfallbacks for injected GitHub prompt context (merged)Cluster 5 — MCP Server / Protocol (92 PRs, 73.9%)
MCP CLI mounting, gateway hardening, playwright integration, timeout/retry semantics. Large footprint (48 files avg) — refactors that touch many handler tables.
tools.github mode(merged)Cluster 2 — AI Engine Integration (72 PRs, 62.5%)⚠️
Lowest merge rate. AWF version bumps (v0.25.x), Pi engine, OTel attribute capture, firewall token steering, retries on health-probe failures. High file count (93 avg) and comment volume (6.0/PR) — these PRs need iteration.
firewall.effective-token-steeringcompiler support #31796)Cluster 1 — Experiments / Frontmatter (65 PRs, 84.6%)
A/B experiment frontmatter, variant scaffolding, shared
experiments/modules. Small, focused diffs.Cluster 4 — Caching (42 PRs, 88.1%) ✅
Highest merge rate. Cache-memory miss detection, repo-memory sanitization, push gating. Smallest footprint (10 files avg), shortest review cycle (1.1 comments/PR).
cache_memory_missdetection & concurrency (merged)push_repo_memoryon agent success (merged)Representative PR data table (top by files-changed per cluster)
Methodology & caveats
/tmp/gh-aw/prompt-cache/pr-full-data/).max_features=800,min_df=3,max_df=0.6, English stop-words plus a domain stop-list (pr,copilot,gh,aw,fix,feat,chore,refactor,test,docs,wip).n_init=20. Swept k=3..10; silhouette scores were tight (0.015–0.032). Picked k=7 as the best score that still produced human-interpretable labels./tmp/gh-aw/agent/cluster.py). Verified by manually inspecting the top-3 PRs of each cluster.gh-aw logs --engine copilotis left for a follow-up pass. Comment count is used as a noisy proxy for iteration count.Recommendations
firewall.effective-token-steeringcompiler support #31796). Consider whether the prompt for AWF version bumps should pin to one open PR per version, or whether the workflow should self-detect that an open Copilot PR for the same bump already exists.aw_info.jsonturn counts per PR would let us replace the comment-count proxy with a real iteration count and identify which clusters burn the most agent turns.References:
Beta Was this translation helpful? Give feedback.
All reactions