Skip to content

Add AWS incident triage agent + CloudWatch investigation skill#2071

Merged
aaronpowell merged 2 commits into
github:stagedfrom
starlightretailceo:contrib/aws-incident-triage
Jun 22, 2026
Merged

Add AWS incident triage agent + CloudWatch investigation skill#2071
aaronpowell merged 2 commits into
github:stagedfrom
starlightretailceo:contrib/aws-incident-triage

Conversation

@starlightretailceo

Copy link
Copy Markdown

Summary

  • Agent (aws-incident-triage.agent.md): On-call SRE persona with 6-phase investigation protocol — alarms → blast radius → metrics → logs → traces → root-cause hypothesis with evidence citations.
  • Skill (skills/aws-cloudwatch-investigation/SKILL.md): 5 reusable patterns — Logs Insights query templates (errors, p99, cold starts, OOM), alarm-to-deploy correlation, blast-radius decision tree, PromQL-style metric queries, incident timeline reconstruction.

Motivation

Existing AWS agents in this repo focus on architecture and planning. None wire up the multi-signal investigation loop that on-call engineers actually need during incidents — fusing alarms, metrics, logs, and traces into a structured triage flow.

Test plan

  • Validate frontmatter matches .schemas/ if schema validation is configured
  • Invoke agent in VS Code Copilot Chat to confirm persona activates
  • Test skill queries against a real CloudWatch log group

🤖 Generated with Claude Code

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This PR targets main, but PRs should target staged.

The main branch is auto-published from staged and should not receive direct PRs.
Please close this PR and re-open it against the staged branch.

You can change the base branch using the Edit button at the top of this PR,
or run: gh pr edit 2071 --base staged

@github-actions github-actions Bot added agent PR touches agents new-submission PR adds at least one new contribution skills PR touches skills targets-main PR targets main instead of staged labels Jun 21, 2026
@starlightretailceo starlightretailceo changed the base branch from main to staged June 21, 2026 01:36
@github-actions github-actions Bot added branched-main PR appears to include plugin files materialized from main external-plugin PR updates plugins/external.json labels Jun 21, 2026
github-actions[bot]
github-actions Bot previously approved these changes Jun 21, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Base branch is now set correctly.

Removing the prior block because this PR no longer targets main.

@github-actions github-actions Bot removed skills PR touches skills targets-main PR targets main instead of staged agent PR touches agents new-submission PR adds at least one new contribution labels Jun 21, 2026
@github-actions

Copy link
Copy Markdown
Contributor

✅ External plugin PR checks passed

  • Changed entries detected: 0
  • Workflow state label: ready-for-review

Per-plugin quality summary

Plugin skill-validator install smoke test overall source tree
none not_run not_run not_run n/a

No changed external plugin entries were detected in this PR.

@github-actions github-actions Bot added the ready-for-review Submission passed intake validation and is ready for maintainer review label Jun 21, 2026
Agent: structured on-call SRE persona driving alarm→metrics→logs→traces→hypothesis loop.
Skill: reusable Logs Insights query templates, blast-radius narrowing, deploy correlation patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@starlightretailceo starlightretailceo force-pushed the contrib/aws-incident-triage branch from f57fe26 to 50afe7b Compare June 21, 2026 02:06
@github-actions github-actions Bot added agent PR touches agents new-submission PR adds at least one new contribution skills PR touches skills skill-check-error Skill validator reported errors skill-check-warning Skill validator reported warnings and removed branched-main PR appears to include plugin files materialized from main external-plugin PR updates plugins/external.json labels Jun 21, 2026
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

🔍 Skill Validator Results

⛔ Findings need attention

Scope Checked
Skills 1
Agents 1
Total 2
Severity Count
❌ Errors 4
⚠️ Warnings 1
ℹ️ Advisories 0

Summary

Level Finding
[AWS CloudWatch Investigation] Skill name 'AWS CloudWatch Investigation' contains invalid characters — must be lowercase alphanumeric and hyphens only.
[AWS CloudWatch Investigation] Skill name 'AWS CloudWatch Investigation' does not match directory name 'aws-cloudwatch-investigation'.
[agent:AWS Incident Triage] Agent name 'AWS Incident Triage' does not match filename 'aws-incident-triage.agent.md' (expected 'AWS Incident Triage.agent.md').
[agent:AWS Incident Triage] Agent name 'AWS Incident Triage' contains invalid characters — must be lowercase alphanumeric and hyphens only.
Full validator output
Found 1 skill(s)
[AWS CloudWatch Investigation] 📊 AWS CloudWatch Investigation: 2,667 BPE tokens [chars/4: 2,633] (standard ~), 21 sections, 15 code blocks
❌ [AWS CloudWatch Investigation] Skill name 'AWS CloudWatch Investigation' contains invalid characters — must be lowercase alphanumeric and hyphens only.
❌ [AWS CloudWatch Investigation] Skill name 'AWS CloudWatch Investigation' does not match directory name 'aws-cloudwatch-investigation'.
[AWS CloudWatch Investigation]    ⚠  Skill is 2,667 BPE tokens (chars/4 estimate: 2,633) — approaching "comprehensive" range where gains diminish.
Skill spec conformance failures — fix the errors above.
Found 1 agent(s)
❌ [agent:AWS Incident Triage] Agent name 'AWS Incident Triage' does not match filename 'aws-incident-triage.agent.md' (expected 'AWS Incident Triage.agent.md').
❌ [agent:AWS Incident Triage] Agent name 'AWS Incident Triage' contains invalid characters — must be lowercase alphanumeric and hyphens only.
Validated 1 agent(s)
Agent spec conformance failures — fix the errors above.

Note: The validator returned a non-zero exit code. Please review the findings above before merge.

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

🔒 PR Risk Scan Results

Scanned 4 changed file(s).

Severity Count
🔴 High 0
🟠 Medium 3
ℹ️ Info 0
Severity Rule File Line Match
🟠 package-exec-command docs/README.skills.md 31 | [acreadiness-assess](../skills/acreadiness-assess/SKILL.md)&lt;br /&gt;`gh skills install github/awesome-copilot acreadiness-assess` | Run the AgentRC readiness assessment on the curre
🟠 unpinned-version-indicator skills/aws-cloudwatch-investigation/SKILL.md 90 | filter @​​message like /Task timed out/ or @​​duration &gt; 28000
🟠 unpinned-version-indicator skills/aws-cloudwatch-investigation/SKILL.md 242 Label: "Latency vs Baseline (ratio &gt; 2 = anomaly)"

This is an automated soft-gate report. Findings indicate review targets and do not block merge by themselves.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aaronpowell aaronpowell merged commit 7d0694f into github:staged Jun 22, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent PR touches agents new-submission PR adds at least one new contribution ready-for-review Submission passed intake validation and is ready for maintainer review skill-check-error Skill validator reported errors skill-check-warning Skill validator reported warnings skills PR touches skills

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants