Skip to content

Populate audit job steps in structured audit output#42222

Merged
pelikhan merged 3 commits into
mainfrom
copilot/cli-tools-test-audit-job-steps
Jun 29, 2026
Merged

Populate audit job steps in structured audit output#42222
pelikhan merged 3 commits into
mainfrom
copilot/cli-tools-test-audit-job-steps

Conversation

Copilot AI commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

gh aw audit was always emitting jobs without step details, even though the GitHub Actions jobs API already returned per-step status and conclusions. This left failed runs without the step-level context needed to identify where a job actually broke.

  • Fetch step data from the jobs API

    • Extend the gh api /actions/runs/{id}/jobs projection to include each job’s steps with name, status, and conclusion.
    • Preserve existing job duration/failure counting behavior while carrying the additional step payload.
  • Thread steps through audit models

    • Add step metadata to the internal job model used by run processing and cached summaries.
    • Map that metadata into audit JSON so each jobs[] entry can expose steps[] instead of dropping it during report assembly.
  • Cover the regression at both layers

    • Add a focused fetch-path test proving step data is parsed from the GitHub CLI/API response.
    • Update audit report tests to verify step arrays survive JSON rendering.

Example output after this change:

{
  "name": "agent",
  "status": "completed",
  "conclusion": "failure",
  "steps": [
    { "name": "Set up job", "status": "completed", "conclusion": "success" },
    { "name": "Run agent", "status": "completed", "conclusion": "failure" }
  ]
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix audit tool to populate job step details Populate audit job steps in structured audit output Jun 29, 2026
Copilot AI requested a review from pelikhan June 29, 2026 11:32
@pelikhan pelikhan marked this pull request as ready for review June 29, 2026 11:38
Copilot AI review requested due to automatic review settings June 29, 2026 11:38
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #42222 does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (exactly 100 additions, threshold is >100).

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances gh aw audit structured output by carrying GitHub Actions job step status/conclusion details end-to-end (API fetch → internal models → rendered audit JSON), so failed runs can be diagnosed at step granularity.

Changes:

  • Extend the jobs API fetch projection to include each job’s steps[] with name, status, and conclusion.
  • Thread step metadata through the internal job model and into the audit report JSON model.
  • Add/extend tests to cover step parsing from gh api output and preservation through JSON rendering.
Show a summary per file
File Description
pkg/cli/logs_models.go Adds JobStep and JobInfo.Steps to carry step metadata in the internal model.
pkg/cli/logs_github_api.go Extends the gh api .../jobs jq projection to include per-job steps[] payload.
pkg/cli/logs_github_api_test.go Adds a focused test that validates steps are parsed from the fetch path and failure counting/duration still work.
pkg/cli/audit_report.go Adds JobStepData and maps internal step data into structured audit output.
pkg/cli/audit_report_test.go Updates audit and JSON rendering tests to assert job steps survive report assembly and JSON encoding/decoding.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 1
  • Review effort level: Low

Comment thread pkg/cli/logs_github_api.go Outdated
Comment on lines +85 to +87
output, err := workflow.RunGHCombined("Fetching job details...", "api",
fmt.Sprintf("repos/{owner}/{repo}/actions/runs/%d/jobs", runID),
"--jq", ".jobs[] | {name: .name, status: .status, conclusion: .conclusion, started_at: .started_at, completed_at: .completed_at}")
"--jq", ".jobs[] | {name: .name, status: .status, conclusion: .conclusion, started_at: .started_at, completed_at: .completed_at, steps: ((.steps // []) | map({name: .name, status: .status, conclusion: .conclusion}))}")
@github-actions github-actions Bot mentioned this pull request Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100 — Excellent

Analyzed 3 test(s) across 2 Go test files: 3 design tests, 0 implementation tests, 0 hard guideline violations (4 missing assertion messages noted).

📊 Metrics & Test Classification (3 tests analyzed)
Metric Value
New/modified tests analyzed 3
✅ Design tests (behavioral contracts) 3 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 2 (67%)
Duplicate test clusters 0
Test inflation detected YES (both test files exceed 2:1 ratio vs production changes)
🚨 Coding-guideline violations 0 hard violations; 4 missing assertion messages (soft flag)
Test File Classification Issues Detected
TestBuildAuditDataComplete (modified) pkg/cli/audit_report_test.go:686 ✅ Design
TestRenderJSONComplete (modified) pkg/cli/audit_report_test.go:923 ✅ Design Happy-path steps only
TestFetchJobDetailsWithCountsIncludesSteps pkg/cli/logs_github_api_test.go:228 ✅ Design 4 bare require.* calls (no message arg)

Go: 2 (*_test.go); JavaScript: 0. No other languages detected.

Inflation detail (binary −10 pt deduction applied):

  • audit_report_test.go: 44 lines added vs audit_report.go 15 lines added → 2.93:1
  • logs_github_api_test.go: 32 lines added vs logs_github_api.go 1 line added → 32:1

Both exceed the 2:1 threshold. Note: the high ratio for logs_github_api_test.go is expected — the production change is a single-line jq query expansion, while the test requires a full fake-gh harness to exercise it. The penalty is applied per rubric but does not indicate poor test quality here.

Score breakdown:

  • Behavioral coverage (40 pts): 3/3 design → 40
  • Error/edge coverage (30 pts): 2/3 with edge cases → 20
  • Low duplication (20 pts): 0 clusters → 20
  • Proportional growth (10 pts): inflation detected → 0
  • Total: 80/100
⚠️ Flagged Tests — Soft Concerns (1 item)

TestFetchJobDetailsWithCountsIncludesSteps (pkg/cli/logs_github_api_test.go:228) — ⚠️ Missing assertion messages: four require.NoError / require.Len calls lack descriptive message arguments, making failure output harder to diagnose. The guideline asks for a context string on every assertion call. Suggested fix: add messages, e.g. require.NoError(t, err, "fetchJobDetailsWithCounts should not error") and require.Len(t, jobs, 1, "should parse exactly one job from fake gh output"). This is a soft concern and does not trigger a block.

Verdict

Check passed. 0% implementation tests (threshold: 30%). All three tests verify observable behavioral contracts — step data is parsed, propagated through the audit pipeline, and serialized correctly into JSON output.

🧪 Test quality analysis by Test Quality Sentinel · 57.6 AIC · ⌖ 17.5 AIC · ⊞ 8.1K ·
Comment /review to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 tests verify observable behavioral contracts covering step data parsing, audit pipeline propagation, and JSON serialization. One soft concern: 4 bare require.* calls in TestFetchJobDetailsWithCountsIncludesSteps lack descriptive message arguments (not a block).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /zoom-out — approving with minor suggestions.

The fix is correct and well-targeted: the jq projection was the single missing link, and threading JobStep through the model + report layers is done cleanly. Regression tests at both the fetch and report layers are a strong signal of healthy TDD practice.

📋 Key Themes & Highlights

Minor Suggestions (non-blocking)

  • Missing edge-case test: the (.steps // []) null-guard in the jq projection has no test proving it handles a job with no steps field. The happy-path test is solid, but the fallback path is untested.
  • Dual struct types: JobStep and JobStepData are structurally identical; a type alias (type JobStepData = JobStep) would eliminate the per-field mapping and reduce future drift risk.
  • Missing console:"-" tag: Steps on JobData is the only field without a console struct tag; an explicit opt-out makes the intent clear to future maintainers.

Positive Highlights

  • ✅ Root cause fixed at the right layer — the jq projection, not a downstream workaround
  • (.steps // []) guard handles null steps defensively
  • ✅ Fake-gh-binary test pattern is consistent with the rest of the file
  • ✅ Cache invalidation (CLI version mismatch) ensures existing run_summary.json files without steps are automatically reprocessed
  • ✅ Step data flows end-to-end: API → model → cache → audit report → JSON

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 63.6 AIC · ⌖ 11.3 AIC · ⊞ 6.6K
Comment /matt to run again

require.NoError(t, err)
assert.Contains(t, string(argsLog), "repos/{owner}/{repo}/actions/runs/28307653871/jobs", "should query the run jobs API")
assert.Contains(t, string(argsLog), "steps:", "gh jq projection should request step data")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Missing edge-case test: job with null or absent steps — the (.steps // []) jq guard is untested.

The guard ensures API responses without a steps field still parse cleanly, but no test exercises this path. A job that returns JSON without a steps key could silently produce a nil slice or a parse error without anyone noticing.

💡 Suggested addition

Add a second sub-test (or table-driven variant) feeding output without a steps field and assert jobs[0].Steps is nil/empty:

// Verify a job with no steps field round-trips without error
assert.Empty(t, jobs[0].Steps, "job with no steps field should produce empty Steps slice")

This documents the contract of (.steps // []) and prevents a regression if the jq expression is later simplified.

@copilot please address this.

Comment thread pkg/cli/audit_report.go
Name string `json:"name"`
Status string `json:"status,omitempty"`
Conclusion string `json:"conclusion,omitempty"`
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] JobStep and JobStepData are structurally identical — the type-cast conversion adds maintenance debt.

The two structs have the same three fields. return JobStepData(step) works via Go's composite-literal type conversion, but any future field added to one will require a matching change in the other (and a updated mapping). Since JobStepData carries no extra behaviour or tags beyond what JobStep already has, consider using a type alias instead:

💡 Simpler alternative
// In audit_report.go — eliminate the duplicate struct entirely
type JobStepData = JobStep  // type alias, no conversion needed

Then the mapping in buildAuditData reduces to:

Steps: slices.Clone(jobDetail.Steps),

If the two types do need to diverge in the future (e.g., console tags, extra audit-only fields), switching back to a concrete struct is a one-line change.

@copilot please address this.

Comment thread pkg/cli/audit_report.go
Status string `json:"status" console:"header:Status"`
Conclusion string `json:"conclusion,omitempty" console:"header:Conclusion,omitempty"`
Duration string `json:"duration,omitempty" console:"header:Duration,omitempty"`
Steps []JobStepData `json:"steps,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Steps field has no console struct tag — steps will be silently invisible in console (tabular) rendering.

Every other field in JobData carries a console:"header:..." tag. Omitting it from Steps is probably intentional (nested slices are hard to render in a table), but leaving it undocumented is a subtle inconsistency.

💡 Suggestion

Add an explicit opt-out tag so the intent is clear to future maintainers:

Steps []JobStepData `json:"steps,omitempty" console:"-"`

The console:"-" pattern is already used in this file (see AwContext) for fields that should be excluded from table output.

@copilot please address this.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Populate audit job steps

Clean, focused change. The plumbing — jq projection → JobStep model → JobStepData in the audit output — is correct end-to-end, and the two new test paths (fetch layer + report layer) give good confidence.

Non-blocking notes (see inline comments):

  • Missing "no steps" test case — the fetch test only covers the happy path where steps are present; a case with "steps":null or "steps":[] in the fake API response would prove the // [] jq fallback and guard backward-compat for pre-existing data.
  • sliceutil.Map + nil input — when jobDetail.Steps is nil, sliceutil.Map returns []JobStepData{} (non-nil empty). omitempty handles the JSON output correctly; just prefer len(job.Steps) == 0 over == nil at any future call-site.

Neither issue is blocking. The implementation is correct and the test coverage is meaningfully improved.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 48.9 AIC · ⌖ 6.51 AIC · ⊞ 4.9K

argsLog, err := os.ReadFile(argsLogPath)
require.NoError(t, err)
assert.Contains(t, string(argsLog), "repos/{owner}/{repo}/actions/runs/28307653871/jobs", "should query the run jobs API")
assert.Contains(t, string(argsLog), "steps:", "gh jq projection should request step data")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion confirms the jq projection mentions steps:, which is useful. However, the test only exercises the case where the API returns populated steps. A second sub-test or table-driven case where the fake gh script emits "steps":null or "steps":[] would verify that the // [] fallback in the jq expression works and that fetchJobDetailsWithCounts returns an empty/nil Steps slice without error — providing a backward-compat regression guard for runs fetched before step data was available.

@copilot please address this.

Comment thread pkg/cli/audit_report.go
Status: jobDetail.Status,
Conclusion: jobDetail.Conclusion,
Steps: sliceutil.Map(jobDetail.Steps, func(step JobStep) JobStepData {
return JobStepData(step)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct struct conversion JobStepData(step) is valid today because JobStep and JobStepData are structurally identical (same field names, types, and tags). Go's type conversion rules mean any field divergence will be caught at compile time, which is safe.

One subtle side-effect: sliceutil.Map calls make([]U, len(slice)), so when jobDetail.Steps is nil (no steps from the API), this returns []JobStepData{} — an empty non-nil slice — rather than nil. The json:"steps,omitempty" tag correctly omits both nil and empty slices from JSON output, so serialized audit reports are unaffected. Just be aware that in-memory checks of job.Steps == nil won't reliably distinguish "no step data" from "zero steps"; prefer len(job.Steps) == 0.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking observations

Correct fix for a real gap — step data flows cleanly from the jobs API through the model and into audit JSON, with tests at both layers. Two non-blocking observations worth considering before this pattern scales:

Findings summary

1. Struct cast coupling ( line 363) — is a named-struct type conversion that compiles only when JobStep and JobStepData are structurally identical. They are today, but the two types serve distinct purposes (internal API model vs. audit output model) and will likely diverge over time. Explicit field mapping is cleaner and safer.

2. jq expression not exercised by tests (logs_github_api_test.go line 254) — the fake gh binary always cats pre-baked JSON, so the actual jq projection in logs_github_api.go is never executed during CI. The assert.Contains(t, ..., "steps:") assertion is too coarse to catch jq field-name typos. A more specific substring check (e.g., ".steps // []") would catch obvious regressions.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

🔎 Code quality review by PR Code Quality Reviewer · 84.8 AIC · ⌖ 7.19 AIC · ⊞ 5.2K
Comment /review to run again

Comment thread pkg/cli/audit_report.go
Status: jobDetail.Status,
Conclusion: jobDetail.Conclusion,
Steps: sliceutil.Map(jobDetail.Steps, func(step JobStep) JobStepData {
return JobStepData(step)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brittle named-struct cast will surprise future maintainers: JobStepData(step) silently depends on JobStep and JobStepData having identical field names, types, and order — any divergence causes a compile error whose origin is non-obvious.

💡 Suggested fix

Use explicit field mapping instead:

Steps: sliceutil.Map(jobDetail.Steps, func(step JobStep) JobStepData {
    return JobStepData{Name: step.Name, Status: step.Status, Conclusion: step.Conclusion}
}),

This makes the intent clear — only these three fields flow from the internal API model to the audit output model — and compiles cleanly even if either struct gains fields independently (e.g., Number int on JobStep for step ordering, without leaking it to audit consumers).

argsLog, err := os.ReadFile(argsLogPath)
require.NoError(t, err)
assert.Contains(t, string(argsLog), "repos/{owner}/{repo}/actions/runs/28307653871/jobs", "should query the run jobs API")
assert.Contains(t, string(argsLog), "steps:", "gh jq projection should request step data")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert.Contains(t, ..., "steps:") check does not validate the jq expression: the fake gh binary cats pre-baked JSON regardless of the jq argument passed, so a typo in the jq projection (e.g., .step // [] instead of .steps // []) would still produce correct steps in the test output and the assertion would still pass.

💡 What this means

The test verifies Go-side parsing of step data — which is correct and valuable. But the actual jq expression in logs_github_api.go line 87 is never executed during tests. A regression in that expression (wrong field name, broken fallback, etc.) would only be caught by running against a real GitHub API.

Consider documenting this known gap in a comment, or strengthening the assertion to check for a more specific substring of the jq expression (e.g., .steps // []), at minimum catching obvious field-name typos:

assert.Contains(t, string(argsLog), ".steps // []", "jq must use .steps with empty-array fallback")

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill

Done — addressed the null-conclusion review thread and ran the pr-finisher checks:

  • jq projection: changed .conclusion(.conclusion // "") for both job and step conclusions, so null values (in-progress/queued jobs) are coerced to empty string before JSON decoding rather than forwarded as JSON null.
  • New test TestFetchJobDetailsWithCountsNullConclusion: verifies in-progress jobs with null conclusions are parsed and not silently dropped.
  • make fmt, make lint, make test-unit all pass locally.

@pelikhan pelikhan merged commit e78a973 into main Jun 29, 2026
29 checks passed
@pelikhan pelikhan deleted the copilot/cli-tools-test-audit-job-steps branch June 29, 2026 13:51
@github-actions

Copy link
Copy Markdown
Contributor

🤖 PR Triage — §28376613466

Field Value
Category bug (feature gap)
Risk 🟢 Low
Priority 🔴 High
Score 69 / 100
Action 🚀 fast_track
Age 2.8h

Score breakdown: Impact 30 + Urgency 22 + Quality 17

Rationale: gh aw audit was silently omitting step-level details from job output, leaving users without actionable failure context. Small focused fix (126+/8−, 5 files), CI fully passing. No reviewer objections. Ready for expedited human review and merge.

Generated by 🔧 PR Triage Agent · 93.2 AIC · ⌖ 14.6 AIC · ⊞ 5.4K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants