Skip to content

[AIROCMLIR-601] [CI] Exclude test sources from Jenkins coverage reports#2379

Open
bogdan-petkovic wants to merge 7 commits into
developfrom
bpetkovi/jenkins-codecov-exclude-test-dirs
Open

[AIROCMLIR-601] [CI] Exclude test sources from Jenkins coverage reports#2379
bogdan-petkovic wants to merge 7 commits into
developfrom
bpetkovi/jenkins-codecov-exclude-test-dirs

Conversation

@bogdan-petkovic
Copy link
Copy Markdown
Contributor

@bogdan-petkovic bogdan-petkovic commented May 19, 2026

Motivation

The Codecov reports for rocMLIR (visible on [app.codecov.io/gh/ROCm/rocMLIR (https://app.codecov.io/gh/ROCm/rocMLIR)) had three independent issues that came to light while investigating AIROCMLIR-601:

  1. Test sources were counted as production code. The lit test passes built into MLIRRockTestPasses (e.g. mlir/test/lib/Dialect/Rock/ TestFunctionFusibility.cpp, TestRockMultibuffer.cpp, …) and the gtest unit tests under mlir/unittests/ were instrumented and reported as if they were production code. This inflated the headline percentage that the biweekly Teams report (.github/workflows/codecov-report.yml) publishes, added noise to per-file drilldowns, and blocked future use of Codecov as a meaningful PR gate.
  2. The Jenkins Code coverage stage timed out chronically and silently. Long phases (lit test buffering, llvm-profdata merge of ~125 GB of *.profraw, three llvm-cov invocations including a slow HTML show) defeated the activity-based timeout, and any timeout/exception was caught and turned into a green build with no Codecov upload — so the Codecov "no report" we kept seeing was actually a silent CI failure, not a Codecov problem.
  3. Even when the stage finished, the upload script was broken and failures were hidden. CODEPATH was empty inside the upload shell, the codecov CLI was invoked without -Z so it returned exit 0 even on missing files / bad tokens, and warnings were logged with echo instead of unstable(), so genuine upload failures showed up as a green build with a WARNING line buried in the console.

Technical Details

All changes are in mlir/utils/jenkins/Jenkinsfile. No other CI / build / source files are touched.

1. Exclude test sources from coverage

collectCoverageData(...) runs three llvm-cov invocations: a text report, an LCOV export (the artifact uploaded to Codecov), and an HTML show. All three previously passed --ignore-filename-regex=external/llvm-project, so only upstream LLVM was filtered. The regex is extended to also drop mlir/test/ and mlir/unittests/:

--ignore-filename-regex='external/llvm-project|mlir/test/|mlir/unittests/'

A small codecov.yml at the repo root with the same ignore: list would also protect the metric for coverage uploaded from any other source (developer-local llvm-cov, future GitHub Actions, one-off re-uploads). Not in this PR — tracked as a follow-up.

2. Fix chronic timeouts in the coverage stage

Switched the parent timeout from activity-based to wall-clock 180 min so silent phases (lit buffering, llvm-profdata merge) cannot make it fire spuriously. The 180 min number is a safety ceiling — real runs are far shorter (see "Future work" below).

3. Fix empty CODEPATH in Codecov upload script

The upload script previously referenced ${CODEPATH} from a nested closure where the variable was empty, so the codecov CLI was effectively invoked as --flags "" -f ./coverage_.lcov. Switched to Groovy double-quoted string interpolation at the call site so ${CODEPATH} resolves to the matrix codepath (e.g. gfx942, gfx90a).

4. Decouple HTML coverage report from the Codecov upload

Moved the slow llvm-cov show --format=html out of collectCoverageData(...) into a new helper produceCoverageHtml(...) that runs after the Codecov upload, wrapped in its own 45 min timeout(...) and catchError(buildResult: 'SUCCESS', stageResult: 'UNSTABLE', ...). A new pipeline parameter runCoverageHtml (default false) toggles it on demand from the Jenkins "Build with parameters" UI, the same way runCodeCov already
does.

Net effect: the HTML report can no longer block or starve the Codecov upload, and the typical CI run skips it entirely.

5. Surface Codecov failures as UNSTABLE instead of silent success

  • Pass -Z to the codecov CLI so missing files / bad tokens / network errors produce a non-zero exit code instead of a silent exit 0.
  • Replace echo "WARNING: ..." with unstable("...") on upload failure.
  • Replace echo "NOTE: Code coverage stage had an error or timeout..." with unstable("...") in the outer catch (Exception) block.

Build colour semantics: unstable(...) marks the build yellow but passing — subsequent stages, archive, and downstream jobs still run.
Coverage problems are now visible in the Stage View instead of hidden in the console, but they never turn the build red.

Test Plan

The Jenkins Code coverage stage on this PR is the source of truth. Each iteration validated:

  • Coverage .lcov no longer contains entries under mlir/test/ or mlir/unittests/ (1).
  • Coverage stage runs to completion within the wall-clock budget (2).
  • Codecov upload exit code: 0 is reported, and the upload is visible at https://app.codecov.io/gh/ROCm/rocMLIR/commits with the matrix flag (gfx942 etc.) attached (3, 4).
  • runCoverageHtml=false (default) skips HTML generation; setting it to true produces coverage_*.html and archives it (4).
  • Forcing an upload failure (revoking the token in a scratch run) marks the build UNSTABLE / yellow with a clear message in Stage View, and does not turn the build red (5).

Test Result

Future work (not in this PR)

  • Add a repo-root codecov.yml mirroring the ignore: list, so the filter applies to any future upload source (GitHub Actions, local re-uploads).
  • Lower the 180 min wall-clock parent timeout to ~120 min after we have 2–3 clean green runs that confirm the real ceiling. Now that timeouts are explicit (yellow), tightening it is safe to do as a one-line follow-up.

Submission Checklist

Signed-off-by: bogdan-petkovic <bpetkovi@amd.com>
@bogdan-petkovic bogdan-petkovic self-assigned this May 19, 2026
@bogdan-petkovic bogdan-petkovic changed the title [CI] Exclude test sources from Jenkins coverage reports [AIROCMLIR-692] [CI] Exclude test sources from Jenkins coverage reports May 19, 2026
@bogdan-petkovic bogdan-petkovic changed the title [AIROCMLIR-692] [CI] Exclude test sources from Jenkins coverage reports [AIROCMLIR-601] [CI] Exclude test sources from Jenkins coverage reports May 19, 2026
@dorde-antic dorde-antic requested a review from Copilot May 20, 2026 12:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Jenkins coverage collection step so Codecov metrics reflect coverage of production code only, excluding MLIR test sources that are currently being instrumented and reported.

Changes:

  • Expand llvm-cov’s --ignore-filename-regex to exclude mlir/test/ and mlir/unittests/.
  • Apply the same filter consistently across the llvm-cov report, llvm-cov export (LCOV), and llvm-cov show (HTML) invocations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bogdan-petkovic bogdan-petkovic marked this pull request as ready for review May 21, 2026 08:25
@bogdan-petkovic bogdan-petkovic requested a review from causten as a code owner May 21, 2026 08:25
Signed-off-by: bogdan-petkovic <bpetkovi@amd.com>
[CI] Fix chronic timeouts in Jenkins Code coverage stage
Signed-off-by: bogdan-petkovic <bpetkovi@amd.com>
Signed-off-by: bogdan-petkovic <bpetkovi@amd.com>
Signed-off-by: bogdan-petkovic <bpetkovi@amd.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2379      +/-   ##
===========================================
+ Coverage    79.50%   81.69%   +2.18%     
===========================================
  Files          100      119      +19     
  Lines        31016    41977   +10961     
  Branches      4819     6940    +2121     
===========================================
+ Hits         24659    34289    +9630     
- Misses        4245     5117     +872     
- Partials      2112     2571     +459     
Flag Coverage Δ
gfx950 81.49% <ø> (?)
mfma 81.37% <ø> (+1.86%) ⬆️
navi4x 81.53% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 117 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants