Agent Performance Report — Week of 2026-05-14 #32124

2026-05-14T13:24:09Z

github-actions[bot]
Bot May 14, 2026

Executive Summary

Workflows tracked: 225 (↑2 from yesterday — 2 new workflows added)
Open [aw] failure issues: 36 (↑6 since this morning's health report; 33 new failure issues created today — highest single-day count observed)
Overall quality score: 74/100 (→ plateau, Day 13 unchanged)
Overall effectiveness score: 71/100 (→ plateau, Day 13 unchanged)
Ecosystem health: 62/100 (↓ -1 from yesterday)
Top performers: Agentic Maintenance, Issue Monster, Auto-Close Parent Issues, Daily File Diet, License Compliance Check
Critical issues: CGO push regression (P1, [CGO] Workflow failure on main - Run #2565 #29669), PR-review cluster waste (~272 runs/day at 0% success), Daily agents batch failure, Ecosystem-wide failure event (33 new issues today)

⚠️ CRITICAL ALERT — May 14: 33 [aw] failure issues were auto-created today in a single day. This is a potential ecosystem-wide failure event. Likely causes: scheduled batch failures, possible shared engine or safe-output infra disruption, or accumulated failures finally triggering issue-creation. Immediate investigation recommended.

🚨 Ecosystem-Wide Failure Event — May 14, 2026

33 new [aw] failure issues created today (compared to ~1/day on May 12-13):

Issue	Workflow
#32119	Auto-Triage Issues
#32116	Claude Code User Documentation Review
#32111	GitHub MCP Structural Analysis
#32110	Daily Token Consumption Report (Sentry OTel)
#32109	Daily Go Function Namer
#32107	Daily Copilot Token Usage Audit
#32106	Package Specification Enforcer
#32105	Typist - Go Type Analysis
#32103	Design Decision Gate
#32095	Copilot Agent Prompt Clustering Analysis
#32094	Matt Pocock Skills Reviewer
#32093	Test Quality Sentinel
#32092	Smoke CI
#32090	Dev
#32088	Daily AW Cross-Repo Compile Check
#32087	Daily News
#32086	Instructions Janitor
#32084	Go Fan
#32083	Daily Rendering Scripts Verifier
#32081	Daily Syntax Error Quality Check
#32079	[aw] Failure Investigator (6h)
#32078	Copilot Session Insights
#32076	Daily Safe Outputs Conformance Checker
#32075	Multi-Device Docs Tester
#32074	Daily AgentRx Trace Optimizer
#32073	CLI Version Checker
#32072	Schema Consistency Checker
#32068	Issue Arborist
#32067	Static Analysis Report
#32066	Daily Grafana OTel Instrumentation Advisor
#32061	Step Name Alignment
#32049	Go Logger Enhancement
#32045	Daily Firewall Logs Collector and Reporter

Scope: 33 diverse workflows (daily agents, code quality agents, smoke tests, moderators) across both scheduled and event-triggered runs.

Notable: The [aw] Failure Investigator itself (#32079) is among the failures — the watchdog is down.

Positive signal: PR #32070 ("Fix safe output bundle fetch for checked-out PR branches") was merged today at 13:08. This may resolve some of the failures that are tied to safe-output issues on PR branches.

Performance Rankings

Top Performing Agents 🏆

Rank	Agent	Quality	Effectiveness	Today's Runs	Notes
1	Agentic Maintenance	90/100	92/100	✅ success	100% success rate, consistent
2	Issue Monster	85/100	87/100	✅ success	6m39s — thorough triage
3	Auto-Close Parent Issues	82/100	85/100	✅ success	100% success, clean outputs
4	Daily File Diet	80/100	80/100	✅ success	Fast, reliable (16s)
5	License Compliance Check	80/100	82/100	✅ success	98% success rate
6	Bot Detection	80/100	80/100	—	90% success, stable
7	PR Triage Agent	80/100	78/100	—	88% success, stable
8	AI Moderator	72/100	68/100	✅ (4×)	Recovering — success on comments
9	Content Moderation	72/100	68/100	✅ (4×)	Recovering — parallel with AI Mod

Agents Needing Improvement 📉

Agent	Quality	Effectiveness	Pattern	Action
CGO	45/100	35/100	Failure every push to main	P1 — issue #29669
Q	30/100	20/100	0% success, structural trigger	P0 — issue #31724
Label Closed PRs	40/100	30/100	action_required persistent	Watch
CJS	50/100	45/100	25% success on pushes	P1 related to CGO
Daily Fact About gh-aw	55/100	40/100	Parse failure (post-fix)	#31432 #31524
Daily Security Red Team	50/100	40/100	Batch failure	#31817
Daily Cache Strategy	50/100	40/100	Batch failure	#31773

Inactive / Zombie Agents

Deployment Incident Monitor: 8× skipped per 100 runs, zero outputs — deprecation candidate
Resource Summarizer Agent: chronic skips, zero outputs — deprecation candidate
Doc Build - Deploy: action_required persistent (deployment stalled since weeks)

Quality Analysis

Quality Distribution (22 agents profiled)

Excellent (80–100): 7 agents (32%)
Good (60–79): 6 agents (27%)
Fair (40–59): 7 agents (32%)
Poor (<40): 2 agents (9%)

Common Quality Issues

Trigger structural failures (Q, Label Closed PRs, Agentic Commands on PRs): action_required returned on every PR trigger. These agents cannot complete their work on PR events, contributing to the 0-output anti-pattern and dragging down ecosystem quality scores.
Daily agent batch failure (Daily Fact, Daily Security Red Team, Daily Cache Strategy): Three daily scheduled agents failing simultaneously. Shared cron slot or runtime environment issue suspected. Quality degraded by absence of outputs.
CGO push regression: Every push to main fails CGO, creating friction for all contributors and blocking CI feedback loops. Now at Day 13+ unresolved.
Safe output infra instability: 33 failures in one day, including the Failure Investigator itself — suggests possible safe-output or engine availability disruption.

Effectiveness Analysis

Task Completion Rates

High completion (>80%): 7 agents
Medium completion (50–80%): 6 agents
Low completion (<50%): 9 agents

13-Day Trend

Quality:      74→74→74→74→74→74→74→74→74→74→74→74→74  (plateau, Day 13)
Effectiveness: 71→71→71→71→71→71→71→71→71→71→71→71→71  (plateau, Day 13)
Health:        63→63→63→63→63→63→63→63→63→63→63→63→62  (slight decline today)

The 13-day plateau is the most notable signal: no improvement despite ongoing fixes. The primary structural blocker remains the PR-review cluster (~272 wasted run-attempts/day at 0% success) which suppresses ecosystem averages by inflating failure counts.

Estimated recovery potential if PR-review cluster is fixed:

Quality: +4 to +6 points (to ~78–80)
Effectiveness: +3 to +5 points (to ~74–76)
Health: +5 to +8 points (to ~67–70)

Behavioral Patterns

Productive Patterns ✅

Agentic Maintenance → CI chain: Consistently runs maintenance tasks, triggers clean CI passes — well-coordinated lifecycle management
Issue Monster + Auto-Triage: Complementary agents — Issue Monster surfaces issues, Auto-Triage classifies them, creating effective issue triage pipeline
Content Moderation + AI Moderator: Twin moderation coverage on issue comments — when working, provides robust dual-layer protection

Problematic Patterns ⚠️

PR-review cluster zero-success loop: Q, Label Closed PRs, Agentic Commands, Smoke CI all firing on every PR event with 0% success on PR triggers (action_required). Wastes ~272 runs/day — the rejig docs #1 ecosystem drain
CGO regression churn: Every push to main triggers a failing CGO run, adding noise to CI feedback and eroding developer trust in the CI signal
Daily agents batch failure: Daily Fact + Security Red Team + Cache Strategy all failing in the same batch — shared cron or runtime dependency failure
Moderation twin flapping: Content Moderation + AI Moderator both at exactly 57% success across all observed runs — identical failure split strongly indicates shared upstream dependency issue
Failure Investigator in failure loop: The [aw] Failure Investigator ([aw] [aw] Failure Investigator (6h) failed #32079) is itself among today's failures — the watchdog cannot watch itself

Coverage Analysis

Well-Covered Areas ✅

Issue management and triage (Issue Monster, Auto-Triage, Auto-Close Parent Issues)
Code quality and maintenance (Agentic Maintenance, Daily File Diet, CGO/CJS)
Content moderation (AI Moderator, Content Moderation, Bot Detection)
License and compliance (License Compliance Check)

Coverage Gaps 🔍

Safe output health: Safe Output Health Monitor had its first failure today — single point of observability failure
Failure investigation: Failure Investigator (6h) itself failed ([aw] [aw] Failure Investigator (6h) failed #32079) — gap in failure response
PR review: PR-review cluster (Q, Scout, etc.) has 0% effectiveness — intended coverage exists on paper but delivers nothing

Recommendations

🔴 High Priority

Investigate today's mass failure event (33 new issues, May 14)
- Audit if safe-output bundle fix (Fix safe output bundle fetch for checked-out PR branches #32070) resolved or introduced failures
- Check for shared engine availability window
- Compare failure times — did they cluster in a specific hour?
- Reference: [aw] Daily Firewall Logs Collector and Reporter failed #32045 through [aw] Auto-Triage Issues failed #32119
Fix PR-review cluster trigger gate ([deep-report] Fix PR-review cluster trigger gates — 272 wasted run-attempts/day across 8 agents #31724)
- Estimated ROI: +272 recovered runs/day, +4–6 quality points
- Estimated effort: 4–8 hours
- This is the single highest-ROI action in the entire ecosystem
Resolve CGO push regression ([CGO] Workflow failure on main - Run #2565 #29669)
- Every push to main fails CGO — 13+ days unresolved
- Blocks contributor confidence in CI signal
Investigate Daily agents batch failure root cause
- Daily Fact ([aw-failures] Daily Fact About gh-aw: 15+ consecutive push-time parse failures — P1 escalation #31432, [deep-report] Add circuit-breaker + schema fix to Daily Fact workflow (P1 — 15+ consecutive parse failures) #31524), Daily Security Red Team ([aw] Daily Security Red Team Agent failed #31817), Daily Cache Strategy ([aw] Daily Cache Strategy Analyzer failed #31773)
- Test whether merging a fix for one resolves all three (shared root cause hypothesis)

🟡 Medium Priority

Moderation twin flapping: Content Moderation + AI Moderator both at 57% — investigate shared upstream dependency
Safe Output Health Monitor recovery: First failure after 9 successes ([aw] Safe Output Health Monitor failed #32063) — needs monitoring
Daily Fact parse failure ([aw-failures] Daily Fact About gh-aw: 15+ consecutive push-time parse failures — P1 escalation #31432, [deep-report] Add circuit-breaker + schema fix to Daily Fact workflow (P1 — 15+ consecutive parse failures) #31524): Still failing post-PR#31411 merge — needs direct investigation

🟢 Low Priority

Deprecate zombie agents: Deployment Incident Monitor, Resource Summarizer Agent — zero outputs, chronic skips
Node.js 20 deprecation: Plan upgrade before Sep 16, 2026 deadline
MCP gateway session timeout (MCP gateway drops out in long running jobs: Streamable HTTP error: Error POSTing to endpoint: session not found #23153): Long-running workflows remain at risk — needs architectural fix

Actions Taken This Run

Analyzed 225 workflows with focus on today's mass failure event (33 new issues)
Generated this performance report
Updated shared memory with latest quality/effectiveness data
Escalated mass failure event to shared alerts

Next Steps

Investigate root cause of 33 simultaneous failures on May 14
Implement PR-review cluster trigger gate fix ([deep-report] Fix PR-review cluster trigger gates — 272 wasted run-attempts/day across 8 agents #31724) — highest ROI
Resolve CGO push regression ([CGO] Workflow failure on main - Run #2565 #29669)
Monitor Safe Output Health Monitor for recurrence
Assess whether PR Fix safe output bundle fetch for checked-out PR branches #32070 (safe output bundle fix) resolved any of today's failures

Analysis period: 2026-05-07 to 2026-05-14
Previous report: §25801923272
Next report: 2026-05-15

Generated by Agent Performance Analyzer - Meta-Orchestrator · ● 21M · ◷

expires on May 15, 2026, 1:24 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of 2026-05-14 #32124

Uh oh!

{{title}}

Uh oh!

Top Performing Agents 🏆

Agents Needing Improvement 📉

Inactive / Zombie Agents

Quality Distribution (22 agents profiled)

Common Quality Issues

Task Completion Rates

13-Day Trend

Replies: 0 comments

Select a reply

Uh oh!

Agent Performance Report — Week of 2026-05-14 #32124

Uh oh!

github-actions[bot] Bot May 14, 2026

Executive Summary

Top Performing Agents 🏆

Agents Needing Improvement 📉

Inactive / Zombie Agents

Quality Distribution (22 agents profiled)

Common Quality Issues

Task Completion Rates

13-Day Trend

Behavioral Patterns

Productive Patterns ✅

Problematic Patterns ⚠️

Coverage Analysis

Well-Covered Areas ✅

Coverage Gaps 🔍

Recommendations

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

Actions Taken This Run

Next Steps

Replies: 0 comments

github-actions[bot]
Bot May 14, 2026