Add CCCL workflow skills and helper agents by alliepiper · Pull Request #8948 · NVIDIA/cccl

alliepiper · 2026-05-12T20:18:48Z

Overview

The cccl-* skills and agents wrap CCCL's build, test, CI, benchmarking, commit/PR, and release
infrastructure into named entry points navigated by intent. Top-level skills (cccl-build,
cccl-triage, cccl-commit, cccl-bench, cccl-infra, …) drive user-facing workflows;
cccl_detail-* skills hold shared reference material; read-only agents handle mechanical work like
fetching failed jobs or summarizing logs. Each repeated workflow is encoded once, so every task
starts from a known entry point with relevant project-specific details in context.

End-to-end prompt examples

"PR #8965 is failing in CI on the libcudacxx jobs for cuda13.2/gcc14 — figure
out why, fix it, commit with override tags so we don't re-run the green half of
the matrix, push, mark ready"

cccl-triage (fetch + cluster + summarize) → engineer fix → cccl-ci-overrides
(generate the override) → cccl-commit (test gate + commit message) →
cccl-pr (push + ready + retrigger CI). End-to-end automation of the most
expensive recurring workflow in this repo.

"device_radix_sort was 1.4x faster on tag 3.0. Bisect, validate the regression
isn't a SASS-level codegen surprise, fix it, commit, PR, request a bench run."

cccl-bisect → cccl-sass-diff (validate it's a real algorithmic regression
not codegen drift) → engineer fix → cccl-bench (verify locally) →
cccl-commit → cccl-pr → cccl-bench (CI bench request with [bench-only]).

"Resplit this branch — it has 14 messy WIP commits, I want 3 clean ones split by
library, rebased on current main"

cccl-resplit-branch → cccl-commit. Backs up tip to refs/backup/<branch>-<ts>,
rebases (escalates conflicts via cccl-clarify), collapses to working-tree via
git reset --mixed main, hands off to cccl-commit with the original commit subjects
as starters.

"I'm onboarding a contributor today. They want to land a small CUB algorithm
change. Hand them the doc."

cccl (entry router) → walks them through: cccl-devcontainer → cccl-cub
(orientation) → cccl-build + cccl-test → cccl-commit → cccl-pr.

Approval gates remain. Skills handle the research, drafting, splitting, and
message composition. Every git add / commit / push, every gh pr write
action, and every /ok to test still waits for explicit user approval.

Full Example Prompts

1. Daily inner loop — build, test, iterate

"Build cub for sm90, then run the device_radix_sort tests"

cccl-build → cccl-test. Picks the right preset, runs the targeted build, ctest-regexes
the requested suite, reports pass/fail. Fast iteration path, single preset, no matrix.

"I just touched cub/cub/device/dispatch/dispatch_reduce.cuh. Build cub fast and run
only the device_reduce tests."

cccl-build → cccl-test. Targeted incremental build via build_and_test_targets.sh;
filters CTest by regex.

"Run the libcudacxx lit tests for cuda/std/__type_traits/scalar_type.h under sm90"

cccl-test. Picks libcudacxx preset, points lit at the right test directory.

"Open a shell in a devcontainer with CUDA 13.2 and gcc 14"

cccl-devcontainer. Wraps .devcontainer/launch.sh --cuda 13.2 --host gcc14.
Detects whether you're already inside a container.

"Build cudax with the cu13 nightly toolkit in a headless container, then run all
cudax tests"

cccl-devcontainer → cccl-build → cccl-test. -d headless launch with
-- ./ci/build_cudax.sh then ./ci/test_cudax.sh.

"What CMake presets are available and which one builds everything for native arch?"

cccl-cmake. Tabulates presets; recommends all-dev.

2. CI firefighting

"Triage PR #8963"

cccl-triage. Resolves the PR's latest CI run, dispatches cccl-ci-fetch-failures
to list failures, clusters by toolchain/library/variant, dispatches
cccl-ci-summarize-job-log in parallel (haiku) on representatives, returns a compact
failure-cluster table and asks which clusters to dig into.

"What's failing on the nightly?"

cccl-triage (nightly mode). Same flow, run-id resolved from nightly.yml. Especially
useful for the matrix-sized failure sets where you need clustering, not 200 raw logs.

"Just give me the failed jobs for the current branch -- I want to grep the list myself"

cccl-ci-fetch-failures direct. Returns TSV: <job-id>\t<full-name>\t<grouping-hint>.

"Summarize this CI job log: https://github.com/NVIDIA/cccl/actions/runs/.../job/..."

cccl-ci-summarize-job-log. Fetches the log, returns failing step, exact command line,
5–20 lines of raw error, and a code/infra/flaky verdict.

"Generate a workflows.override so this PR only re-runs the cub and libcudacxx jobs
on gcc 14"

cccl-ci-overrides. Reads ci/matrix.yaml schema, emits the minimum override matrix
snippet plus recommended skip tags, with rationale.

"Why did the cuda12.6/clang14 job run for this PR? I didn't touch anything that
needs clang."

cccl-ci + cccl-ci-overrides. Explains matrix expansion via
ci/inspect_changes.py and project_files_and_dependencies.yaml, identifies the
trigger path.

"Walk me through how PR CI is structured — what's the difference between the
pull_request and nightly workflows?"

cccl-ci. Reference skill — flow diagram, sources of truth, skip-tag mechanics.

3. Regression hunting

"device_scan was 1.2x faster a week ago. Find the commit that regressed it."

cccl-bisect (cloud route). Dispatches git-bisect.yml workflow with the right
runner label, build/test targets, and good/bad refs. Returns the bad commit hash with
the distinguishing command line — a local reproducer.

"Bisect this segfault on the cuda13.2/gcc14 config — it definitely worked on the
3.0 release."

cccl-bisect. Resolves 3.0 to a tag, runs cloud bisect, returns the bad commit
with a reproducer command.

"Bisect locally in a devcontainer — I don't want to wait for the cloud queue"

cccl-bisect (local route). Wraps ci/util/git_bisect.sh inside
.devcontainer/launch.sh.

"Did my recent CUB tuning change affect codegen for DeviceRadixSort?"

cccl-sass-diff. Builds both refs, dumps SASS via cuobjdump, normalizes addresses
and register renames, reports the top 5 non-trivial diffs by kernel.

4. Commit / PR endgame

"Commit these changes"

cccl-commit. Component selection → optional split → interactive chunk walkthrough
→ optional test gate → commit message draft (Trivial/Standard/Detailed) → git commit -F.
Refuses on main.

"Wrap this up — I want three separate commits split by library (cub, thrust,
libcudacxx). Run the precommit gate first."

cccl-commit. Plans three commit groups, walks chunks, runs pre-commit, drafts per-group
messages, executes each commit.

"Push and open a draft PR titled [Tile] Reenable seed_seq tests"

cccl-pr (open new draft). Sanity-check, detect push remote, push branch, open draft PR
with the title and body.

"Update the PR body to mention the SASS-diff results"

cccl-pr (edit existing). gh pr edit --body-file -.

"Mark PR #9001 ready for review"

cccl-pr (draft→ready transition).

"Trigger CI on this PR"

cccl-pr (push + trigger). SHA verification gate, then /ok to test <SHA> comment.
Never posts without verification.

5. Library development

"Add a CUB device-scope algorithm cub::DeviceMode that returns the most-frequent
value. Tour me through the directory layout and tuning policy conventions."

cccl-cub (orientation) → manual implementation → cccl-build + cccl-test to
verify. Covers block/warp/device/agent scopes, the tuning-policy selector pattern,
and Catch2 vs legacy test layout.

"Make this cudax change libcudacxx-style compliant"

cccl-libcudacxx (style references — headers.md, macros.md, naming.md,
templates.md, testing.md, visibility.md). Style enforcement applies to
libcudacxx/include/ AND cudax/include/.

"Where do I add a new Thrust algorithm with CUDA + cpp + omp + tbb backends?"

cccl-thrust. Explains the per-backend directory layout (thrust/system/{cuda,cpp,omp,tbb}/),
the ADL dispatch via execution policies, and the typical pattern of thrust::sort →
cub::DeviceRadixSort for the CUDA backend.

"What's the C ABI pattern for adding a new algorithm to the C Parallel Library?"

cccl-c. Three-call pattern (_build, _run, _cleanup), stable C ABI layer,
JIT-backed cubins via NVRTC, custom iterator/operator types via template strings.

"What's in cudax that's stable enough to graduate to libcudacxx?"

cccl-cudax + cccl-libcudacxx. Covers the zero-stability contract and
CCCL_ENABLE_UNSTABLE flag on the cudax side; the upstream-tracking model and
where CCCL extensions live on the libcudacxx side.

"Test cuda.compute against the cu13 install"

cccl-python. pip install -e python/cuda_cccl[test-cu13] then
ci/test_cuda_compute_python.sh.

"I added a new Numba CUDA cooperative primitive under cuda.coop._experimental.
How do I wire up the tests?"

cccl-python. Explains the cuda_coop test pattern, points at
ci/test_cuda_coop_python.sh.

6. Performance

"Write a CUB benchmark for the new DeviceThreeWayPartition algorithm using
nvbench, with %RANGE% tuning annotations for items-per-thread"

cccl-bench (nvbench-template reference). Generates per-variant .cu files with
the shared base.cuh pattern.

"Request a CI bench run for this PR — focus on device_reduce and device_scan,
sm90 + sm120 GPUs only"

cccl-bench (ci-bench-request reference). Edits ci/bench.yaml with the filters,
appends [bench-only] to the commit message. Requires reset to template before merge.

"Compare perf of this branch vs main for thrust::sort on 1M..256M element keys"

cccl-bench (local-run reference). Wraps ci/bench/compare_git_refs.sh.

"Sweep CUB's BlockScan tuning space for sm120 and pick a new policy"

cccl-bench (tuning reference). Wraps the cccl.bench harness with
CUB_ENABLE_TUNING=ON, generates .variant targets, sweeps, picks the optimum.

"Write a Python benchmark using cuda.bench for the new cuda.compute.sort_pairs
binding"

cccl-bench + cccl-python. Python path uses cuda.bench with axis registration
and bench.run_all_benchmarks(sys.argv).

7. Infrastructure & release

"Bump the supported CUDA toolkit to 13.3"

cccl-infra (ctk-bump playbook). Edits ci/matrix.yaml (ctk_versions,
devcontainer_version, workflow rows), regenerates .devcontainer/ via the
matrix-aware generator, verifies the workflow expansion. Refuses to hand-edit
individual devcontainer.json files.

"Add support for gcc 15 to the host compiler matrix"

cccl-infra (compiler-bump playbook). Adds to host_compilers, cuda-specific
version table, workflow rows, regenerates devcontainers.

"Cut a 3.2.0 release"

cccl-infra (release-cut playbook). Drives ci/update_version.sh, version files
per library (cub, thrust, libcudacxx, cudax), cccl-version.json,
docs/VERSION.md, Python package, workflows. Never hand-edits version files.

"Add a new project under c/parallel/ called cccl-async and wire it into CI"

cccl-infra (project-add playbook). ci/matrix.yaml workflow rows + jobs:,
ci/project_files_and_dependencies.yaml new key + deps, CMakePresets.json,
build/test scripts. Touches every infra file the project needs.

"Pre-commit is failing — fix the formatting"

cccl-precommit. Runs the suite, reviews diffs, stages fixed files, re-runs.
Knows the auto-fix subset (clang-format, ruff, gersemi, end-of-file) vs the
non-auto-fix subset (codespell, mypy, shellcheck).

"Build the docs locally"

cccl-docs. Runs ./docs/gen_docs.bash (Linux-only, builds Doxygen 1.9.6 first
run, creates venv, runs Sphinx).

"My new header isn't showing up in the API docs"

cccl-docs (doxygen-breathe-gotchas reference). Per-library Doxyfile inclusion
patterns, Breathe bridge config, custom _ext/auto_api_generator.py.

8. Decision-point prompts

"I'm stuck — should I cherry-pick this fix onto branch/3.1.x or wait for the
next 3.2 release?"

cccl-clarify. Three-step ladder: default reasoning from project conventions →
check the release cadence and the bug severity → ask the user with framed
options (cherry-pick / wait / hotfix release / break this down).

"I have a clang-format diff but also a real code change in the same hunk —
separate them?"

cccl-commit + cccl-clarify. Surfaces the choice as part of the interactive
chunk walkthrough.

Architecture & layout

Everything lives under .agent/:

.agent/
├── agents/
│   ├── cccl-ci-fetch-failures.md      # haiku, read-only
│   ├── cccl-ci-overrides.md           # sonnet, read-only
│   └── cccl-ci-summarize-job-log.md   # haiku, read-only
└── skills/
    ├── cccl/                          # entry router
    │   └── SKILL.md
    ├── cccl-build/                    # workflow skill (top-level, user-facing)
    │   ├── SKILL.md                   # always-loaded summary
    │   └── references/
    │       ├── tools.md               # wrapped-command inventory
    │       ├── docs.md                # canonical doc pointers
    │       └── <topic>.md             # on-demand detail
    └── cccl_detail-ci/                # internal reference skill
        ├── SKILL.md
        └── references/

AGENTS.md slims to a routing README; CLAUDE.md symlinks to it.
.claude/{skills,agents} symlink into .agent/ so Claude Code and Codex resolve
the same files. A SessionStart hook surfaces the cccl entry skill at
session start.

Two skill tiers:

cccl-* — user-facing workflow entry points, triggered by intent
("triage PR #X", "build cub", "commit these changes"). Each owns a workflow.
cccl_detail-* — internal reference material composed by top-level
skills, not invoked directly by users. Loaded when a workflow skill needs
the underlying mechanics (CI matrix expansion, CMake module internals,
release version mechanics).

Each skill follows a progressive-disclosure pattern: SKILL.md (frontmatter
description + workflow body) is the always-loaded summary; references/<topic>.md
files load on demand.

Agents are non-interactive, read-only subagents dispatched by skills. All three
current agents serve cccl-ci / cccl-triage / cccl-commit; they exist because
the work is mechanical and parallelizable (one log per cluster, one override per
diff).

Permissions model

.claude/settings.json adds a read-only allow-list scoped to what the skills
need: gh read forms (pr view/checks/list/diff, run view/list, workflow list/view, issue view/list, search, api for repos/NVIDIA/cccl/actions/{jobs,runs}/*),
git read forms (status, log, diff, show, blame, …), text inspection
(rg, grep, jq, sed -n, ls, cat, head, tail, wc, file,
stat), and mkdir -p /tmp/claude/* for scratch.

Mutating operations (git add, git commit, git push, gh pr create,
gh pr comment, gh workflow run, …) are intentionally not allow-listed —
every mutation prompts for explicit user approval.

Top-level skills

User-facing entry points under .agent/skills/cccl-*/. Triggered by intent;
/<skill> is the explicit fallback.

Skill	Purpose
`cccl`	Entry router — directs to the right workflow tool by intent
`cccl-build`	Single-preset or full-matrix C++ builds (CUB / Thrust / libcudacxx / cudax / cccl-c)
`cccl-test`	CTest / lit / compute-sanitizer test runners, matched to `cccl-build` paths
`cccl-bisect`	Git bisect on cloud GPU runners or locally in a devcontainer
`cccl-devcontainer`	Launch Docker containers with chosen CTK + host compiler
`cccl-cmake`	CMake preset and option reference
`cccl-precommit`	Pre-commit hook suite reference and auto-fix patterns
`cccl-ci`	CI matrix overview, PR-run flow, skip-tag and override mechanics
`cccl-triage`	Diagnose CI failures (PR or nightly) — fetch / cluster / summarize / fix
`cccl-commit`	Interactive commit prep — split / walk / test gate / draft / commit
`cccl-pr`	PR lifecycle — open / edit / comment / push / `/ok to test`
`cccl-resplit-branch`	Rebase + resplit a feature branch into a clean commit series
`cccl-clarify`	Decision-escalation ladder (default reasoning → self-research → ask)
`cccl-bench`	nvbench / `cuda.bench` / CI bench requests / `cccl.bench` tuning
`cccl-sass-diff`	Codegen comparison (SASS / PTX) between two builds
`cccl-cub`	CUB orientation — scopes, tuning policies, tests
`cccl-thrust`	Thrust orientation — backends, execution policies, CUB integration
`cccl-libcudacxx`	libcudacxx orientation — LLVM tracking, CCCL extensions, style enforcement
`cccl-cudax`	cudax orientation — experimental features, stability contract
`cccl-c`	C Parallel Library orientation — stable C ABI, JIT, FFI
`cccl-python`	`cuda-cccl` Python package — modules, build/test, install extras
`cccl-docs`	Sphinx + Doxygen build, deploy, layout
`cccl-infra`	Cross-cutting infra — CTK bump, compiler bump, release cut, project add

Internal cccl_detail-* skills

Composed by the top-level skills above; not invoked directly by users.

Skill	Loaded by
`cccl_detail-ci`	`cccl-ci`, `cccl-triage`, `cccl-ci-overrides` — matrix expansion, copy-pr-bot, inspect-changes
`cccl_detail-cmake`	`cccl-cmake`, `cccl-build` — module internals, arch-flag mechanics
`cccl_detail-cpp-macros`	`cccl-libcudacxx` — compiler detection, diagnostics, visibility/ABI
`cccl_detail-devcontainer-matrix`	`cccl-infra`, `cccl-devcontainer` — devcontainer generation from `ci/matrix.yaml`
`cccl_detail-examples`	`cccl-cub`, `cccl-thrust`, `cccl-libcudacxx` — examples layout, CMake test setup
`cccl_detail-github`	`cccl-ci` — workflow templates, action structures
`cccl_detail-release`	`cccl-infra` — version management, release cycle internals
`cccl_detail-test-params`	`cccl-test`, `cccl-cub`, `cccl-thrust` — CTest / lit parameter expansion

Agents

Read-only, non-interactive subagents dispatched by skills.

Agent	Model	Role
`cccl-ci-fetch-failures`	haiku	Pull failed jobs from a CCCL CI run; return TSV with grouping hints
`cccl-ci-summarize-job-log`	haiku	Digest one job log — failing step, exact command line, raw error, classification
`cccl-ci-overrides`	sonnet	Generate minimum `workflows.override` matrix + skip tags from failures or diff

Composed by cccl-triage (parent workflow that handles user dialogue) and
cccl-commit (consumes override output during the test-gate step).

Establishes a single source-of-truth bootstrap (AGENTS.md) and a catalogue of 14 skills + 4 agents under `.agent/{skills,agents}/` that route by user intent. Both Claude Code and Codex resolve the same files via the `.claude/{skills,agents}` symlinks. Skills: - cccl, cccl-agent-impl - orientation + concept primer - cccl-clarify - decision-point escalation - cccl-commit - interactive commit prep - cccl-pr - PR lifecycle (open / edit / comment / push + CI) - cccl-resplit-branch - rebase + resplit commit history - cccl-triage-pr - diagnose CI failures on a PR - cccl-triage-nightly - diagnose CI failures in the latest nightly - cccl-ci, cccl-ci-benchmarks, cccl-bisect, cccl-devcontainers, cccl-build-and-test-targets, cccl-cpp-builds, cccl-python, cccl-sass-diff, cccl-libcudacxx-style - CI / build / test references Agents (haiku, non-interactive): - cccl-ok-to-test - SHA-verified `/ok to test` poster - cccl-fetch-ci-failures - paginated job-failure TSV - cccl-summarize-job-log - 5-10 line log digest - cccl-ci-overrides - matrix-override YAML + skip-tag generation Bootstrap: - AGENTS.md - minimal routing README pointing at the `cccl` skill - CLAUDE.md - symlink to AGENTS.md - .claude/settings.json - read-only allow-list (gh / git read forms, rg / grep / jq / sed -n, ls / cat / head / tail / wc / file / stat, mkdir -p /tmp/claude/*) plus SessionStart hook surfacing `cccl`. Mutating ops intentionally not allow-listed - they prompt every use. Also renames `.agent/skills/libcudacxx-style/` to `.agent/skills/cccl-libcudacxx-style/` to match the cccl-* prefix convention across the rest of the catalogue.

Generated when the agent venv-installs pre-commit per AGENTS.md's "Pre-commit" section. Untracked venvs noise up `git status` and risk accidental staging.

Pre-commit hooks like pretty-format-json, end-of-file-fixer, trim-trailing-whitespace, and ruff format rewrite files in place. On failure with auto-fixes applied, the skill now routes each fixed file through cccl-clarify (re-stage / revert / discuss) - the same flow as the per-chunk action menu - rather than bulk-staging the fixes. Also notes the venv-install fallback for when pre-commit is absent from the host.

copy-pr-bot · 2026-05-12T20:18:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

alliepiper · 2026-05-12T20:26:24Z

/ok to test 11b0173

github-actions · 2026-05-13T02:35:35Z

😬 CI Workflow Results

🟥 Finished in 6h 02m: Pass: 99%/500 | Total: 7d 15h | Max: 1h 13m | Hits: 69%/893309

See results here.

Top-level overview of the cccl-* skill and agent framework: purpose, end-to-end prompt examples, approval gates, and detailed example prompts per workflow area. Sits at .agent/skills/cccl-README.md as a sibling to the cccl/ entry skill. [skip-matrix][skip-vdc][skip-docs][skip-tpt]

alliepiper added 5 commits May 12, 2026 15:50

Ignore .venv/

ab6256b

Generated when the agent venv-installs pre-commit per AGENTS.md's "Pre-commit" section. Untracked venvs noise up `git status` and risk accidental staging.

Add CI-scoping reminders to cccl-commit and cccl-pr

434c795

Ignore .claude/ in CI dep classification

11b0173

github-project-automation Bot added this to CCCL May 12, 2026

github-project-automation Bot moved this to Todo in CCCL May 12, 2026

cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 12, 2026

alliepiper marked this pull request as ready for review May 12, 2026 20:28

alliepiper requested a review from a team as a code owner May 12, 2026 20:28

alliepiper requested a review from jrhemstad May 12, 2026 20:28

cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 12, 2026

alliepiper mentioned this pull request May 13, 2026

[FEA]: [DevEx] Parse, dedup, summarize, and format a report of CI errors per PR run #5757

Open

1 task

alliepiper added 2 commits May 14, 2026 10:58

Mk II

a539995

alliepiper requested a review from tpn May 14, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CCCL workflow skills and helper agents#8948

Add CCCL workflow skills and helper agents#8948
alliepiper wants to merge 7 commits into
NVIDIA:mainfrom
alliepiper:ci_skills

alliepiper commented May 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

alliepiper commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alliepiper commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

End-to-end prompt examples

Full Example Prompts

1. Daily inner loop — build, test, iterate

2. CI firefighting

3. Regression hunting

4. Commit / PR endgame

5. Library development

6. Performance

7. Infrastructure & release

8. Decision-point prompts

Architecture & layout

Permissions model

Top-level skills

Internal cccl_detail-* skills

Agents

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

alliepiper commented May 12, 2026

Uh oh!

github-actions Bot commented May 13, 2026

😬 CI Workflow Results

🟥 Finished in 6h 02m: Pass: 99%/500 | Total: 7d 15h | Max: 1h 13m | Hits: 69%/893309

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alliepiper commented May 12, 2026 •

edited

Loading