Skip to content

[CK] Extract Jenkinsfile helpers into vars/ck.groovy shared library#7743

Open
brockhargreaves-amd wants to merge 8 commits into
developfrom
users/brockhargreaves-amd/ck/jenkinsfile-redesign
Open

[CK] Extract Jenkinsfile helpers into vars/ck.groovy shared library#7743
brockhargreaves-amd wants to merge 8 commits into
developfrom
users/brockhargreaves-amd/ck/jenkinsfile-redesign

Conversation

@brockhargreaves-amd
Copy link
Copy Markdown
Contributor

Motivation

The CK Jenkinsfile is a 2,215-line monolith mixing helper function definitions with pipeline stage declarations. This makes it difficult to review, modify, or extend CI stages without wading through unrelated infrastructure code.

Technical Details

Extract all helper functions from the Jenkinsfile into vars/ck.groovy, loaded at runtime via ck = load "vars/ck.groovy" in the first stage. The Jenkinsfile is reduced from 2,215 lines to 810 lines containing only the pipeline structure.

  • 36 helper functions moved to ck.groovy with no logic changes
  • 10 new stage-wrapper functions (runBuildCKAndTests, runTileEngineGemmTests, runClangFormat, etc.) extract inline environment{}/steps{} business logic from stages, eliminating the MethodTooLargeException caused by CPS-transformed shell strings exceeding the JVM 64KB bytecode limit
  • All ck. method calls in steps{} blocks wrapped in script{} as required by Jenkins Declarative Pipeline
  • rocmnode() remains in the Jenkinsfile (needed for agent{} labels before ck is loaded)
  • CRON_SETTINGS / POLL_SPEC remain in the Jenkinsfile (triggers{} evaluates at parse time before any workspace is available)
  • No stage names changed

Test Plan

  • Jenkinsfile validated against the Jenkins Pipeline Linter (/pipeline-model-converter/validate)
  • All 35 shared helper functions diffed line-by-line against develop to verify no regressions
  • Merge from develop incorporated and verified (gfx1250 stage, ROCm 7.13 default, cmake_build updates)

Test Result

  • Linter: passes
  • Function diff vs develop: all 35 functions match exactly
  • Awaiting Jenkins run to confirm end-to-end stage execution

Submission Checklist

brockhargreaves-amd and others added 6 commits May 13, 2026 20:02
Move all 35 helper functions from the monolithic Jenkinsfile into
a separate vars/ck.groovy file, following MiOpen's existing pattern.
Pure code movement — no logic changes, no renames, no stage name
changes.

rocmnode() remains in the Jenkinsfile because it's used in agent{}
labels throughout the declarative pipeline. The library is loaded
in the "Determine CI Execution" stage via checkout scm + load.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Jenkins Declarative Pipeline does not allow method calls on objects
(ck.foo()) directly in steps{} blocks — they must be inside script{}
blocks. The original bare function calls worked because they were in
the same compilation unit, but loading via ck = load "vars/ck.groovy"
makes them object method calls.

Also fixes ck.ck.checkoutComposableKernel() typo (double ck prefix).

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Extract all inline shell commands and build arguments from Jenkinsfile
environment{} blocks into named functions in ck.groovy. Jenkinsfile now
contains only pipeline structure: stage names, when conditions, agent
labels, and single-line ck.runXxx() calls.

New functions: runClangFormat, runClangFormatAndCppcheck,
runFullGroupedConvTileTests, runGroupedConvLargeCaseTests,
runComprehensiveConvDatasetTests, runTileEngineBasicTests,
runTileEngineGemmTests, runBuildCKAndTests, runBuildInstancesOnly.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Resolved conflicts by keeping our clean ck.groovy-based structure throughout.
Incorporated develop's new additions:
- Add BUILD_GFX1250 param and gfx1250 build stage (private docker, no reboot)
- Add runBuildCKForGfx1250() to ck.groovy
- Update cmake_build() to use default ROCm compiler for gfx1250 targets
- ROCMVERSION default already updated to 7.13 in our branch
- TILE_ENGINE_SAMPLING_TIER=daily already in our runTileEngineGemmTests()

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Phase 1 copied helper functions from the original monolith, but develop
had already updated three of them before our branch point:

- cmake_build: restore RUN_ROCM_CK_TESTS cmake flag, ninja trace archival
  after full builds, gfx1250 skip-tests logic, and BUILD_INSTANCES_ONLY
  package stashing that were added in commits 316fded and de93737
- getBaseDockerImageName: revert private-registry branching by ROCM version
  (develop reverted this back to always use public registry)
- getPytorchTestsCmds: restore mkdir/cp from Jenkins workspace path
  (develop updated this from /tmp/pytorch in commit 7e032ad)

Also restore show_node_info() hostname line from commit 47e8768
(John Robbins) that was dropped during Phase 1 extraction.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
gfx1250 is now a case in the existing switch rather than a separate
function. Its unique args (private docker image, no_reboot,
DISABLE_DL_KERNELS) are passed via extraBuildArgs/extraSetupArgs.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Develop added deleteDir() at the top of 25 stage steps{} blocks to
ensure a clean workspace before each stage runs. These were lost when
conflicts were resolved by keeping our side. Restores them in the same
stages that develop had them, excluding the 5 downstream test stages
(Pytorch, AITER, FA) which never had them.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Composable Kernel Jenkins pipeline by moving the Jenkinsfile’s helper logic into a separately loaded Groovy script (vars/ck.groovy), leaving the Jenkinsfile primarily responsible for declarative pipeline structure and stage definitions.

Changes:

  • Added projects/composablekernel/vars/ck.groovy containing the extracted helper functions plus new stage-wrapper functions.
  • Simplified projects/composablekernel/Jenkinsfile to load the helper script once and call ck.* helpers (wrapped in script {} where required).
  • Replaced several large inline stage environment{}/steps{} command blocks with wrapper calls (e.g., clang-format/cppcheck, grouped-conv tests, tile-engine tests, build-and-test stages).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
projects/composablekernel/vars/ck.groovy New loaded helper script containing extracted pipeline helpers and stage-wrapper functions.
projects/composablekernel/Jenkinsfile Updated declarative pipeline to load ck.groovy and delegate stage logic to ck.* helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deleteDir()
buildHipClangJobAndReboot(setup_args:setup_args, setup_cmd: "", build_cmd: "", execute_cmd: execute_cmd)
script { ck.runClangFormatAndCppcheck() }
archiveArtifacts "build/ck_cppcheck.log"
Copy link
Copy Markdown
Contributor Author

@brockhargreaves-amd brockhargreaves-amd May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot ck_cppcheck.log path mismatch: NOT a regression. The --output-file=ck_cppcheck.log writes to CWD (the build/ dir) while archiveArtifacts "build/ck_cppcheck.log" references it correctly from workspace root. The same pattern exists in develop's monolith at line 1529. Pre-existing, not introduced by our refactor.

Comment on lines +1344 to +1361
if (arch == "gfx942") {
execute_cmd = """
cmake -G Ninja -D CMAKE_PREFIX_PATH=/opt/rocm \
-D BUILD_CK_TILE_ENGINE="ON" \
-D CMAKE_CXX_COMPILER="${compiler}" \
-D CMAKE_BUILD_TYPE=Release \
-D GPU_TARGETS="gfx942" \
-D GEMM_UNIVERSAL_DATATYPE="fp8;fp16;bf8;bf16" \
-D GEMM_UNIVERSAL_LAYOUT="rcr;rrr;crr;ccr" \
-D GEMM_STREAMK_DATATYPE="fp8;fp16" \
-D GEMM_STREAMK_LAYOUT="rcr" \
-D GEMM_MULTI_D_DATATYPE="fp16" \
-D GEMM_MULTI_D_LAYOUT="rcrr;rrrr;crrr;ccrr" \
-D GEMM_PRESHUFFLE_DATATYPE="fp16;fp8;bf16;bf8" \
-D GEMM_PRESHUFFLE_LAYOUT="rcr" \
-D GROUPED_GEMM_DATATYPE="fp8;fp16" \
-D GROUPED_GEMM_LAYOUT="rcr;rrr;crr;ccr" .. && \
ninja -j${nthreads()} benchmark_gemm_universal_all benchmark_gemm_preshuffle_all benchmark_gemm_multi_d_all benchmark_gemm_streamk_all benchmark_grouped_gemm_all && \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot This was addressed in the latest commit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — -D TILE_ENGINE_SAMPLING_TIER=daily is present for all three arches (gfx942 line 1361, gfx950 line 1380, gfx1201 line 1394) and gemm_universal_results.json is restored for gfx942 (line 1363) in the current code.

Comment on lines +1436 to +1446
def runBuildInstancesOnly(String compiler) {
buildHipClangJobAndReboot(
setup_cmd: "",
build_cmd: "",
build_type: 'Release',
execute_cmd: """
cmake -G Ninja -D CMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_CXX_COMPILER="${compiler}" \
-DCMAKE_HIP_COMPILER="${compiler}" \
-D CMAKE_BUILD_TYPE=Release .. && ninja -j64"""
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Not a regression — develop's "Build CK instances for all supported targets" stage at line 2047 also calls buildHipClangJobAndReboot without setup_args: "NO_CK_BUILD". Our runBuildInstancesOnly is a faithful extraction.

The observation about the smart-build logic is technically correct — on PR runs where RUN_ALL_UNIT_TESTS is false, execute_cmd would get blanked at line 739. But this is a pre-existing bug in develop, not something we introduced. Our refactor is supposed to be a pure code movement with no behavior changes.

Worth noting for a follow-up fix, but not something to address in this PR.

…ineGemmTests

Restore TILE_ENGINE_SAMPLING_TIER=daily cmake flag for all three arch
branches (gfx942, gfx950, gfx1201) and correct gfx942's benchmark
output filename from results.json to gemm_universal_results.json,
matching develop.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants