Add CCCL workflow skills and helper agents#8948
Open
alliepiper wants to merge 7 commits into
Open
Conversation
Establishes a single source-of-truth bootstrap (AGENTS.md) and a
catalogue of 14 skills + 4 agents under `.agent/{skills,agents}/`
that route by user intent. Both Claude Code and Codex resolve the
same files via the `.claude/{skills,agents}` symlinks.
Skills:
- cccl, cccl-agent-impl - orientation + concept primer
- cccl-clarify - decision-point escalation
- cccl-commit - interactive commit prep
- cccl-pr - PR lifecycle (open / edit / comment / push + CI)
- cccl-resplit-branch - rebase + resplit commit history
- cccl-triage-pr - diagnose CI failures on a PR
- cccl-triage-nightly - diagnose CI failures in the latest nightly
- cccl-ci, cccl-ci-benchmarks, cccl-bisect, cccl-devcontainers,
cccl-build-and-test-targets, cccl-cpp-builds, cccl-python,
cccl-sass-diff, cccl-libcudacxx-style - CI / build / test references
Agents (haiku, non-interactive):
- cccl-ok-to-test - SHA-verified `/ok to test` poster
- cccl-fetch-ci-failures - paginated job-failure TSV
- cccl-summarize-job-log - 5-10 line log digest
- cccl-ci-overrides - matrix-override YAML + skip-tag generation
Bootstrap:
- AGENTS.md - minimal routing README pointing at the `cccl` skill
- CLAUDE.md - symlink to AGENTS.md
- .claude/settings.json - read-only allow-list (gh / git read forms,
rg / grep / jq / sed -n, ls / cat / head / tail / wc / file / stat,
mkdir -p /tmp/claude/*) plus SessionStart hook surfacing `cccl`.
Mutating ops intentionally not allow-listed - they prompt every use.
Also renames `.agent/skills/libcudacxx-style/` to
`.agent/skills/cccl-libcudacxx-style/` to match the cccl-* prefix
convention across the rest of the catalogue.
Generated when the agent venv-installs pre-commit per AGENTS.md's "Pre-commit" section. Untracked venvs noise up `git status` and risk accidental staging.
Pre-commit hooks like pretty-format-json, end-of-file-fixer, trim-trailing-whitespace, and ruff format rewrite files in place. On failure with auto-fixes applied, the skill now routes each fixed file through cccl-clarify (re-stage / revert / discuss) - the same flow as the per-chunk action menu - rather than bulk-staging the fixes. Also notes the venv-install fallback for when pre-commit is absent from the host.
Contributor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test 11b0173 |
Contributor
😬 CI Workflow Results🟥 Finished in 6h 02m: Pass: 99%/500 | Total: 7d 15h | Max: 1h 13m | Hits: 69%/893309See results here. |
1 task
Top-level overview of the cccl-* skill and agent framework: purpose, end-to-end prompt examples, approval gates, and detailed example prompts per workflow area. Sits at .agent/skills/cccl-README.md as a sibling to the cccl/ entry skill. [skip-matrix][skip-vdc][skip-docs][skip-tpt]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
The
cccl-*skills and agents wrap CCCL's build, test, CI, benchmarking, commit/PR, and releaseinfrastructure into named entry points navigated by intent. Top-level skills (
cccl-build,cccl-triage,cccl-commit,cccl-bench,cccl-infra, …) drive user-facing workflows;cccl_detail-*skills hold shared reference material; read-only agents handle mechanical work likefetching failed jobs or summarizing logs. Each repeated workflow is encoded once, so every task
starts from a known entry point with relevant project-specific details in context.
End-to-end prompt examples
cccl-triage(fetch + cluster + summarize) → engineer fix →cccl-ci-overrides(generate the override) →
cccl-commit(test gate + commit message) →cccl-pr(push + ready + retrigger CI). End-to-end automation of the mostexpensive recurring workflow in this repo.
cccl-bisect→cccl-sass-diff(validate it's a real algorithmic regressionnot codegen drift) → engineer fix →
cccl-bench(verify locally) →cccl-commit→cccl-pr→cccl-bench(CI bench request with[bench-only]).cccl-resplit-branch→cccl-commit. Backs up tip torefs/backup/<branch>-<ts>,rebases (escalates conflicts via
cccl-clarify), collapses to working-tree viagit reset --mixed main, hands off tocccl-commitwith the original commit subjectsas starters.
cccl(entry router) → walks them through:cccl-devcontainer→cccl-cub(orientation) →
cccl-build+cccl-test→cccl-commit→cccl-pr.Full Example Prompts
1. Daily inner loop — build, test, iterate
cccl-build→cccl-test. Picks the right preset, runs the targeted build, ctest-regexesthe requested suite, reports pass/fail. Fast iteration path, single preset, no matrix.
cccl-build→cccl-test. Targeted incremental build viabuild_and_test_targets.sh;filters CTest by regex.
cccl-test. Picks libcudacxx preset, points lit at the right test directory.cccl-devcontainer. Wraps.devcontainer/launch.sh --cuda 13.2 --host gcc14.Detects whether you're already inside a container.
cccl-devcontainer→cccl-build→cccl-test.-dheadless launch with-- ./ci/build_cudax.shthen./ci/test_cudax.sh.cccl-cmake. Tabulates presets; recommendsall-dev.2. CI firefighting
cccl-triage. Resolves the PR's latest CI run, dispatchescccl-ci-fetch-failuresto list failures, clusters by toolchain/library/variant, dispatches
cccl-ci-summarize-job-login parallel (haiku) on representatives, returns a compactfailure-cluster table and asks which clusters to dig into.
cccl-triage(nightly mode). Same flow, run-id resolved fromnightly.yml. Especiallyuseful for the matrix-sized failure sets where you need clustering, not 200 raw logs.
cccl-ci-fetch-failuresdirect. Returns TSV:<job-id>\t<full-name>\t<grouping-hint>.cccl-ci-summarize-job-log. Fetches the log, returns failing step, exact command line,5–20 lines of raw error, and a code/infra/flaky verdict.
cccl-ci-overrides. Readsci/matrix.yamlschema, emits the minimum override matrixsnippet plus recommended skip tags, with rationale.
cccl-ci+cccl-ci-overrides. Explains matrix expansion viaci/inspect_changes.pyandproject_files_and_dependencies.yaml, identifies thetrigger path.
cccl-ci. Reference skill — flow diagram, sources of truth, skip-tag mechanics.3. Regression hunting
cccl-bisect(cloud route). Dispatchesgit-bisect.ymlworkflow with the rightrunner label, build/test targets, and good/bad refs. Returns the bad commit hash with
the distinguishing command line — a local reproducer.
cccl-bisect. Resolves3.0to a tag, runs cloud bisect, returns the bad commitwith a reproducer command.
cccl-bisect(local route). Wrapsci/util/git_bisect.shinside.devcontainer/launch.sh.cccl-sass-diff. Builds both refs, dumps SASS viacuobjdump, normalizes addressesand register renames, reports the top 5 non-trivial diffs by kernel.
4. Commit / PR endgame
cccl-commit. Component selection → optional split → interactive chunk walkthrough→ optional test gate → commit message draft (Trivial/Standard/Detailed) →
git commit -F.Refuses on
main.cccl-commit. Plans three commit groups, walks chunks, runs pre-commit, drafts per-groupmessages, executes each commit.
cccl-pr(open new draft). Sanity-check, detect push remote, push branch, open draft PRwith the title and body.
cccl-pr(edit existing).gh pr edit --body-file -.cccl-pr(draft→ready transition).cccl-pr(push + trigger). SHA verification gate, then/ok to test <SHA>comment.Never posts without verification.
5. Library development
cccl-cub(orientation) → manual implementation →cccl-build+cccl-testtoverify. Covers block/warp/device/agent scopes, the tuning-policy selector pattern,
and Catch2 vs legacy test layout.
cccl-libcudacxx(style references —headers.md,macros.md,naming.md,templates.md,testing.md,visibility.md). Style enforcement applies tolibcudacxx/include/ANDcudax/include/.cccl-thrust. Explains the per-backend directory layout (thrust/system/{cuda,cpp,omp,tbb}/),the ADL dispatch via execution policies, and the typical pattern of
thrust::sort→cub::DeviceRadixSortfor the CUDA backend.cccl-c. Three-call pattern (_build,_run,_cleanup), stable C ABI layer,JIT-backed cubins via NVRTC, custom iterator/operator types via template strings.
cccl-cudax+cccl-libcudacxx. Covers the zero-stability contract andCCCL_ENABLE_UNSTABLEflag on the cudax side; the upstream-tracking model andwhere CCCL extensions live on the libcudacxx side.
cccl-python.pip install -e python/cuda_cccl[test-cu13]thenci/test_cuda_compute_python.sh.cccl-python. Explains thecuda_cooptest pattern, points atci/test_cuda_coop_python.sh.6. Performance
cccl-bench(nvbench-template reference). Generates per-variant.cufiles withthe shared
base.cuhpattern.cccl-bench(ci-bench-request reference). Editsci/bench.yamlwith the filters,appends
[bench-only]to the commit message. Requires reset to template before merge.cccl-bench(local-run reference). Wrapsci/bench/compare_git_refs.sh.cccl-bench(tuning reference). Wraps thecccl.benchharness withCUB_ENABLE_TUNING=ON, generates.varianttargets, sweeps, picks the optimum.cccl-bench+cccl-python. Python path usescuda.benchwith axis registrationand
bench.run_all_benchmarks(sys.argv).7. Infrastructure & release
cccl-infra(ctk-bump playbook). Editsci/matrix.yaml(ctk_versions,devcontainer_version, workflow rows), regenerates.devcontainer/via thematrix-aware generator, verifies the workflow expansion. Refuses to hand-edit
individual
devcontainer.jsonfiles.cccl-infra(compiler-bump playbook). Adds tohost_compilers, cuda-specificversion table, workflow rows, regenerates devcontainers.
cccl-infra(release-cut playbook). Drivesci/update_version.sh, version filesper library (cub, thrust, libcudacxx, cudax),
cccl-version.json,docs/VERSION.md, Python package, workflows. Never hand-edits version files.cccl-infra(project-add playbook).ci/matrix.yamlworkflow rows +jobs:,ci/project_files_and_dependencies.yamlnew key + deps,CMakePresets.json,build/test scripts. Touches every infra file the project needs.
cccl-precommit. Runs the suite, reviews diffs, stages fixed files, re-runs.Knows the auto-fix subset (clang-format, ruff, gersemi, end-of-file) vs the
non-auto-fix subset (codespell, mypy, shellcheck).
cccl-docs. Runs./docs/gen_docs.bash(Linux-only, builds Doxygen 1.9.6 firstrun, creates venv, runs Sphinx).
cccl-docs(doxygen-breathe-gotchas reference). Per-library Doxyfile inclusionpatterns, Breathe bridge config, custom
_ext/auto_api_generator.py.8. Decision-point prompts
cccl-clarify. Three-step ladder: default reasoning from project conventions →check the release cadence and the bug severity → ask the user with framed
options (cherry-pick / wait / hotfix release / break this down).
cccl-commit+cccl-clarify. Surfaces the choice as part of the interactivechunk walkthrough.
Architecture & layout
Everything lives under
.agent/:AGENTS.mdslims to a routing README;CLAUDE.mdsymlinks to it..claude/{skills,agents}symlink into.agent/so Claude Code and Codex resolvethe same files. A
SessionStarthook surfaces theccclentry skill atsession start.
Two skill tiers:
cccl-*— user-facing workflow entry points, triggered by intent("triage PR #X", "build cub", "commit these changes"). Each owns a workflow.
cccl_detail-*— internal reference material composed by top-levelskills, not invoked directly by users. Loaded when a workflow skill needs
the underlying mechanics (CI matrix expansion, CMake module internals,
release version mechanics).
Each skill follows a progressive-disclosure pattern:
SKILL.md(frontmatterdescription + workflow body) is the always-loaded summary;
references/<topic>.mdfiles load on demand.
Agents are non-interactive, read-only subagents dispatched by skills. All three
current agents serve
cccl-ci/cccl-triage/cccl-commit; they exist becausethe work is mechanical and parallelizable (one log per cluster, one override per
diff).
Permissions model
.claude/settings.jsonadds a read-only allow-list scoped to what the skillsneed:
ghread forms (pr view/checks/list/diff,run view/list,workflow list/view,issue view/list,search,apiforrepos/NVIDIA/cccl/actions/{jobs,runs}/*),gitread forms (status,log,diff,show,blame, …), text inspection(
rg,grep,jq,sed -n,ls,cat,head,tail,wc,file,stat), andmkdir -p /tmp/claude/*for scratch.Mutating operations (
git add,git commit,git push,gh pr create,gh pr comment,gh workflow run, …) are intentionally not allow-listed —every mutation prompts for explicit user approval.
Top-level skills
User-facing entry points under
.agent/skills/cccl-*/. Triggered by intent;/<skill>is the explicit fallback.ccclcccl-buildcccl-testcccl-buildpathscccl-bisectcccl-devcontainercccl-cmakecccl-precommitcccl-cicccl-triagecccl-commitcccl-pr/ok to testcccl-resplit-branchcccl-clarifycccl-benchcuda.bench/ CI bench requests /cccl.benchtuningcccl-sass-diffcccl-cubcccl-thrustcccl-libcudacxxcccl-cudaxcccl-ccccl-pythoncuda-ccclPython package — modules, build/test, install extrascccl-docscccl-infraInternal cccl_detail-* skills
Composed by the top-level skills above; not invoked directly by users.
cccl_detail-cicccl-ci,cccl-triage,cccl-ci-overrides— matrix expansion, copy-pr-bot, inspect-changescccl_detail-cmakecccl-cmake,cccl-build— module internals, arch-flag mechanicscccl_detail-cpp-macroscccl-libcudacxx— compiler detection, diagnostics, visibility/ABIcccl_detail-devcontainer-matrixcccl-infra,cccl-devcontainer— devcontainer generation fromci/matrix.yamlcccl_detail-examplescccl-cub,cccl-thrust,cccl-libcudacxx— examples layout, CMake test setupcccl_detail-githubcccl-ci— workflow templates, action structurescccl_detail-releasecccl-infra— version management, release cycle internalscccl_detail-test-paramscccl-test,cccl-cub,cccl-thrust— CTest / lit parameter expansionAgents
Read-only, non-interactive subagents dispatched by skills.
cccl-ci-fetch-failurescccl-ci-summarize-job-logcccl-ci-overridesworkflows.overridematrix + skip tags from failures or diffComposed by
cccl-triage(parent workflow that handles user dialogue) andcccl-commit(consumes override output during the test-gate step).