From 340520581453c33cce4c76f51ee00220d9ee26e9 Mon Sep 17 00:00:00 2001 From: Allison Piper Date: Tue, 12 May 2026 15:50:03 -0400 Subject: [PATCH 1/7] Add repo-local agent infrastructure: skills, agents, and bootstrap Establishes a single source-of-truth bootstrap (AGENTS.md) and a catalogue of 14 skills + 4 agents under `.agent/{skills,agents}/` that route by user intent. Both Claude Code and Codex resolve the same files via the `.claude/{skills,agents}` symlinks. Skills: - cccl, cccl-agent-impl - orientation + concept primer - cccl-clarify - decision-point escalation - cccl-commit - interactive commit prep - cccl-pr - PR lifecycle (open / edit / comment / push + CI) - cccl-resplit-branch - rebase + resplit commit history - cccl-triage-pr - diagnose CI failures on a PR - cccl-triage-nightly - diagnose CI failures in the latest nightly - cccl-ci, cccl-ci-benchmarks, cccl-bisect, cccl-devcontainers, cccl-build-and-test-targets, cccl-cpp-builds, cccl-python, cccl-sass-diff, cccl-libcudacxx-style - CI / build / test references Agents (haiku, non-interactive): - cccl-ok-to-test - SHA-verified `/ok to test` poster - cccl-fetch-ci-failures - paginated job-failure TSV - cccl-summarize-job-log - 5-10 line log digest - cccl-ci-overrides - matrix-override YAML + skip-tag generation Bootstrap: - AGENTS.md - minimal routing README pointing at the `cccl` skill - CLAUDE.md - symlink to AGENTS.md - .claude/settings.json - read-only allow-list (gh / git read forms, rg / grep / jq / sed -n, ls / cat / head / tail / wc / file / stat, mkdir -p /tmp/claude/*) plus SessionStart hook surfacing `cccl`. Mutating ops intentionally not allow-listed - they prompt every use. Also renames `.agent/skills/libcudacxx-style/` to `.agent/skills/cccl-libcudacxx-style/` to match the cccl-* prefix convention across the rest of the catalogue. --- .agent/agents/cccl-ci-overrides.md | 131 +++++ .agent/agents/cccl-fetch-ci-failures.md | 53 ++ .agent/agents/cccl-ok-to-test.md | 53 ++ .agent/agents/cccl-summarize-job-log.md | 59 ++ .agent/skills/cccl-agent-impl/SKILL.md | 54 ++ .agent/skills/cccl-bisect/SKILL.md | 73 +++ .../cccl-build-and-test-targets/SKILL.md | 73 +++ .agent/skills/cccl-ci-benchmarks/SKILL.md | 55 ++ .agent/skills/cccl-ci/SKILL.md | 54 ++ .agent/skills/cccl-clarify/SKILL.md | 43 ++ .agent/skills/cccl-commit/SKILL.md | 120 ++++ .agent/skills/cccl-cpp-builds/SKILL.md | 53 ++ .agent/skills/cccl-devcontainers/SKILL.md | 58 ++ .../SKILL.md | 2 +- .agent/skills/cccl-pr/SKILL.md | 100 ++++ .agent/skills/cccl-python/SKILL.md | 46 ++ .agent/skills/cccl-resplit-branch/SKILL.md | 111 ++++ .agent/skills/cccl-sass-diff/SKILL.md | 32 ++ .agent/skills/cccl-triage-nightly/SKILL.md | 42 ++ .agent/skills/cccl-triage-pr/SKILL.md | 85 +++ .agent/skills/cccl/SKILL.md | 39 ++ .claude/agents | 1 + .claude/settings.json | 69 +++ .claude/skills | 1 + .claude/skills/libcudacxx-style/SKILL.md | 6 - AGENTS.md | 515 +++--------------- CLAUDE.md | 7 +- 27 files changed, 1481 insertions(+), 454 deletions(-) create mode 100644 .agent/agents/cccl-ci-overrides.md create mode 100644 .agent/agents/cccl-fetch-ci-failures.md create mode 100644 .agent/agents/cccl-ok-to-test.md create mode 100644 .agent/agents/cccl-summarize-job-log.md create mode 100644 .agent/skills/cccl-agent-impl/SKILL.md create mode 100644 .agent/skills/cccl-bisect/SKILL.md create mode 100644 .agent/skills/cccl-build-and-test-targets/SKILL.md create mode 100644 .agent/skills/cccl-ci-benchmarks/SKILL.md create mode 100644 .agent/skills/cccl-ci/SKILL.md create mode 100644 .agent/skills/cccl-clarify/SKILL.md create mode 100644 .agent/skills/cccl-commit/SKILL.md create mode 100644 .agent/skills/cccl-cpp-builds/SKILL.md create mode 100644 .agent/skills/cccl-devcontainers/SKILL.md rename .agent/skills/{libcudacxx-style => cccl-libcudacxx-style}/SKILL.md (99%) create mode 100644 .agent/skills/cccl-pr/SKILL.md create mode 100644 .agent/skills/cccl-python/SKILL.md create mode 100644 .agent/skills/cccl-resplit-branch/SKILL.md create mode 100644 .agent/skills/cccl-sass-diff/SKILL.md create mode 100644 .agent/skills/cccl-triage-nightly/SKILL.md create mode 100644 .agent/skills/cccl-triage-pr/SKILL.md create mode 100644 .agent/skills/cccl/SKILL.md create mode 120000 .claude/agents create mode 100644 .claude/settings.json create mode 120000 .claude/skills delete mode 100644 .claude/skills/libcudacxx-style/SKILL.md mode change 100644 => 120000 CLAUDE.md diff --git a/.agent/agents/cccl-ci-overrides.md b/.agent/agents/cccl-ci-overrides.md new file mode 100644 index 00000000000..7ab63386ee1 --- /dev/null +++ b/.agent/agents/cccl-ci-overrides.md @@ -0,0 +1,131 @@ +--- +name: cccl-ci-overrides +description: "Use this agent when a caller skill wants to limit CCCL CI cost on a PR via `workflows.override` matrix entries and/or `[skip-*]` commit tags. Typical triggers include cccl-triage-pr building a targeted-repro override after diagnosing failures, cccl-triage-nightly building one with `for_workflow: nightly`, and commit-prep flows asking \"what override + skip tags fit this diff?\". Takes working changes (paths or diff range) and/or a list of failed-job names; returns override snippet + skip tags + per-decision rationale. Knows `ci/project_files_and_dependencies.yaml`, `ci/matrix.yaml`, and `ci-overview.md`. Non-interactive. See \"When to invoke\" in the agent body for worked scenarios." +model: sonnet +color: magenta +tools: Bash, Read, Grep +--- + +# cccl-ci-overrides + +Advise on CI cost-limiting measures — override matrix entries and skip tags. + +## When to invoke + +- **Targeted repro from failed jobs.** Triage skill diagnosed failures and wants the minimum override matrix that + reproduces them on a subsequent CI run. +- **Diff-driven override.** Commit-prep flow has a set of changed paths (or a diff range) and wants to know which + matrix entries are needed and which `[skip-*]` tags are safe. +- **Combined input.** Both failed-job list and changed paths; the agent unions and de-dupes the entries. + +## Sources of truth + +- `ci/project_files_and_dependencies.yaml` — project definitions, `include_regexes`, `exclude_regexes`, + `exclude_project_files`, `lite_dependencies`, `full_dependencies`, global `ignore_regexes`. `core` is special: + any unmatched non-ignored file marks `core` dirty → full rebuild. +- `ci/matrix.yaml` — `workflows.override` schema (see top-of-file examples). Workflow sections: `pull_request`, + `pull_request_lite`, `nightly`, `weekly`, `python-wheels`, `devcontainers`. Plus `exclude:` rules, `jobs:` + catalogue (job-key → `name:`), `projects:` catalogue, `tags:` defaults (notably + `project: { default: ['libcudacxx', 'cub', 'thrust'] }`). +- `ci-overview.md` — canonical `[skip-*]` tokens. + +## Tool to lean on + +`ci/inspect_changes.py --refs ` (or `--file`, `--stdin`) already implements the dep-graph trace and +honors `ignore_regexes` + `exclude_*` rules. Prefer it over re-implementing. + +## Inputs + +Any combination of: + +- `paths:` (newline-separated changed paths) OR `diff_range: ..` — drives override + skip-tag + analysis. +- `failed_jobs:` (path to file with failed-job names, one per line) — drives direct-reproduction override. +- `for_workflow:` — `pull_request` (default) | `pull_request_lite` | `nightly` | `weekly`. + +At least one of `paths`/`diff_range`/`failed_jobs` required. + +## Override matrix — from changes + +1. Run `ci/inspect_changes.py` to classify dirty projects. +2. From `for_workflow`'s section, pull entries that name a dirty project (or omit `project:` and the default set + intersects dirty). +3. Subtract `exclude:` matches. +4. Emit as override entries. + +## Override matrix — from failed jobs + +1. Parse each name: `[CTK C++] ()`. Cross-reference `jobs:` in + matrix.yaml to map `` (e.g. `BuildHostLaunch`, `TestNoLaunch`, `NVRTC`) → job key (e.g. `build_lid0`, + `test_nolid`, `nvrtc`). +2. Build the minimum override entry per name — `{jobs: [], project: , std: , ctk: , + cxx: , gpu: }`. +3. Merge entries sharing `(project, jobs)`; combine `std`/`ctk`/`cxx` into lists. + +## Combining inputs + +If caller provides both, union the entries. De-dupe. + +## Snippet format + +```yaml +# Targeted repro of . Reset before merging. +- {jobs: ['build'], project: 'libcudacxx', std: 'all', ctk: ['12.0', '12.X'], cxx: ['gcc8', 'gcc9', 'gcc10']} +- {jobs: ['build'], project: 'cub', std: 17, ctk: ['12.0', '12.X'], cxx: ['gcc8']} +``` + +`` = nightly run ID / PR check context / `` / "manual triage". + +For targeted repro via `build_and_test_targets.sh`, prefer the `target` project pattern from matrix.yaml's +top-of-file example: + +```yaml +- { jobs: ['run_gpu'], project: 'target', ctk: ['13.X'], cxx: 'gcc', gpu: 'rtxa6000', + args: '--preset cub-cpp20 --build-targets "cub.cpp20.test.iterator" --ctest-targets "cub.cpp20.test.iterator"' } +``` + +If `workflows.override:` is already non-empty, emit as **additions** — caller decides whether to append or +replace. + +## Skip tags (path-based) + +For each `[skip-*]` token in `ci-overview.md`, suggest if no changed path matches the area it protects: + +| Tag | Suggest when no changed path matches | +|------------------|-----------------------------------------------| +| `[skip-docs]` | `docs/`, `*.rst` | +| `[skip-vdc]` | `.devcontainer/`, `ci/`, `.github/workflows/` | +| `[skip-tpt]` | third-party canary triggers | +| `[skip-rapids]` | RAPIDS paths (subset of tpt) | +| `[skip-matx]` | MatX paths (subset of tpt) | +| `[skip-pytorch]` | PyTorch paths (subset of tpt) | +| `[skip-matrix]` | no CCCL build/test code (rare — docs/CI-only) | + +Changes purely within `workflows.override:` target CI scope, not CI infra — don't withhold `[skip-vdc]` for them. +Paths matching `ignore_regexes` already don't trigger CI — exclude in both directions. + +Note that the skip tags only apply to the last commit in a branch; save them until the end if making multiple +commits. + +## Output + +``` +## Override matrix snippet (insert under `workflows.override:`) + +```yaml +# . Reset before merging. + +``` + +## Skip tags + +`[skip-vdc][skip-docs][skip-tpt]` + +## Rationale + +- Override: +- Skip tags: +- Inputs: +``` + +Omit "Override matrix snippet" if no entries; omit "Skip tags" if no `paths`/`diff_range` given. diff --git a/.agent/agents/cccl-fetch-ci-failures.md b/.agent/agents/cccl-fetch-ci-failures.md new file mode 100644 index 00000000000..73fbf267f94 --- /dev/null +++ b/.agent/agents/cccl-fetch-ci-failures.md @@ -0,0 +1,53 @@ +--- +name: cccl-fetch-ci-failures +description: "Use this agent when a caller skill needs the list of failed jobs from a CCCL CI run, given either a PR number or a workflow run ID. Typical triggers include cccl-triage-pr collecting failures for the current branch's PR, cccl-triage-nightly collecting failures for the latest scheduled nightly run, and any other skill that needs failed-job TSV output for downstream summarization or override-matrix generation. Output is a TSV at a caller-specified path with one row per failed job: `\\t\\t`. Handles `gh api --paginate` and the `jq -s` slurp gotcha. Non-interactive. See \"When to invoke\" in the agent body for worked scenarios." +model: haiku +color: cyan +tools: Bash, Read +--- + +# cccl-fetch-ci-failures + +Return failed jobs from a CCCL CI run as TSV. + +## When to invoke + +- **Triage-PR fetch.** A PR-triage skill has the PR number and needs a TSV of failed jobs to pick representatives + for log-fetching. Caller hands over PR#, output path, scratch dir. +- **Triage-nightly fetch.** A nightly-triage skill has the workflow run ID (resolved from + `gh run list --workflow=ci-workflow-nightly.yml`) and needs the same TSV. Caller hands over run ID, output path, + scratch dir. + +## Inputs + +One of: +- `pr: ` — latest run on the PR. +- `run: ` — specific workflow run. + +Plus `output: ` and `scratch: `. Missing any → abort. + +## Steps + +1. **Resolve the run ID.** If `pr:` given: + - `gh pr view --repo NVIDIA/cccl --json headRefName,headRefOid` → `BRANCH`, `HEAD_SHA`. + - `gh run list --repo NVIDIA/cccl --branch --limit 5 --json databaseId,headSha,conclusion` → pick the + latest entry whose `headSha == HEAD_SHA`. No match → abort. + - `RUN_ID = databaseId` from that entry. + + Avoid `gh pr view --json statusCheckRollup` — it returns 100k+ tokens on CCCL PRs. +2. **Fetch jobs.** `gh api repos/NVIDIA/cccl/actions/runs//jobs?per_page=100 --paginate` into + `/jobs_raw.json`. `--paginate` concatenates objects; subsequent `jq` needs `-s`. +3. **Extract failures.** `jq -s -r '[.[].jobs[] | select(.conclusion == "failure")] | .[] | [.id, .name] | @tsv'` + into `/failed_jobs_raw.tsv`. Empty → return zero-failures. +4. **Append grouping hints.** Per row, parse the name and append `||`: + - Toolchain: `[CTK C++]` substring. + - Project: CUB / libcudacxx / Thrust / cudax / Python. + - Variant: Build / Test / HostLaunch / DeviceLaunch / TestNoLaunch / etc. + + Example row: + ``` + 74849038365 [CTK13.2 GCC15 C++20] cudax TestNoLaunch(amd64) CTK13.2 GCC15 C++20|cudax|TestNoLaunch + ``` + + Write to ``. +5. **Return summary** — count + tally of the third column. diff --git a/.agent/agents/cccl-ok-to-test.md b/.agent/agents/cccl-ok-to-test.md new file mode 100644 index 00000000000..be831cc13e0 --- /dev/null +++ b/.agent/agents/cccl-ok-to-test.md @@ -0,0 +1,53 @@ +--- +name: cccl-ok-to-test +description: "Use this agent when a caller skill has pushed a commit to a CCCL PR's branch and wants to trigger CI by posting the copy-pr-bot `/ok to test ` comment. Typical triggers include cccl-triage-pr after a fix commit lands on an existing PR, cccl-triage-nightly after opening a new draft PR for a nightly fix, and any caller that needs the SHA-verification gate (local HEAD vs remote PR head) before posting. The agent verifies the local SHA matches the remote head, aborts on mismatch, posts the comment, and suggests the caller schedule a 20-minute polling loop. Non-interactive. Never pushes, never creates PRs, never force-pushes — the caller owns all of those decisions. See \"When to invoke\" in the agent body for worked scenarios." +model: haiku +color: yellow +tools: Bash, Read +--- + +# cccl-ok-to-test + +Verify local-vs-remote SHA for a CCCL PR; post `/ok to test `. + +## When to invoke + +- **PR-triage CI restart.** Caller has just pushed a fix commit to the existing PR's branch. Agent verifies local + HEAD matches remote head, posts `/ok to test `, returns the SHA + a polling reminder. +- **Nightly-triage first CI run.** Caller just created a draft PR for a nightly fix and needs the initial + `/ok to test`. Same flow. +- **Mismatch gate.** Caller (or user) suspects local and remote may have diverged. Agent's first job is to + refuse-and-report on mismatch. + +## Inputs + +1. `` +2. `` (typically `NVIDIA/cccl`, always explicit) +3. `` + +Missing → abort naming the field. + +## Steps + +1. `git rev-parse HEAD` → `LOCAL_SHA`. The only SHA used in the comment; never derived elsewhere. +2. `gh pr view --repo --json headRefOid,isDraft,headRefName` → `REMOTE_SHA`, `isDraft`, + `headRefName`. +3. `headRefName != ` → abort showing both. +4. `LOCAL_SHA != REMOTE_SHA` → abort: + ``` + ERROR: local HEAD does not match remote PR head. + local: + remote: + Likely: unpushed commits, or someone else pushed after you. + Aborting without posting `/ok to test`. + ``` +5. `gh pr comment --repo --body "/ok to test "`. +6. Return: + ``` + Posted `/ok to test ` on PR #. Draft: . + Caller: consider `ScheduleWakeup(delaySeconds=1200)` polling on + `gh pr checks `. + ``` + +Local SHA is the contract — the caller just pushed it. Remote SHA is checked only as a sync gate against +concurrent pushes. diff --git a/.agent/agents/cccl-summarize-job-log.md b/.agent/agents/cccl-summarize-job-log.md new file mode 100644 index 00000000000..77259d7632e --- /dev/null +++ b/.agent/agents/cccl-summarize-job-log.md @@ -0,0 +1,59 @@ +--- +name: cccl-summarize-job-log +description: "Use this agent when a caller skill has downloaded a single CCCL CI job log and needs a 5–10 line summary. Typical triggers include cccl-triage-pr or cccl-triage-nightly summarizing one representative log per failure cluster (dispatched in parallel — one agent per log), and any other workflow that wants to digest a job log without loading the full output into orchestrator context. Input is a path to a downloaded job log (typically `/tmp/claude//job_.log`). Output covers first real error, failing command/step, stack trace, infra-vs-code classification, and anything CCCL-specific worth flagging. Non-interactive. See \"When to invoke\" in the agent body for worked scenarios." +model: haiku +color: cyan +tools: Bash, Read, Grep +--- + +# cccl-summarize-job-log + +Read one CCCL CI job log; return a tight summary. + +## When to invoke + +- **Cluster-representative summarization.** A triage skill picked one representative job per failure cluster, + fetched logs to `/tmp/claude//job_.log`, and dispatches one summarize agent per log in parallel. + Each returns first-error, failing-step, infra-vs-code classification. +- **One-off log digest.** A skill needs to know what's in a single job log (whose path it already has) without + reading the full text into orchestrator context. + +## Inputs + +- `log: ` — full path to a downloaded job log. +- `context: ` (optional) — e.g. job name + toolchain. + +Missing `log:` → abort. + +## Steps + +1. **Find the first real error.** Grep for `error|FAIL|exit code|##[error]` (case-insensitive) and read context + around the hits. Ignore retries of the same error — pick the underlying cause. +2. **Identify the failing step.** GHA logs prefix each step with a `##[group]` banner; the command appears just + below (often with `+` from `set -x`). +3. **Capture the diagnostic.** File:line + 1–2 lines of context for compiler/linker/test failures; step name for + infra failures. +4. **Classify.** `code` (real failure) / `infra` (network, artifact, container pull, runner crash, OOM, timeout) / + `flaky` (known-flaky test, rest of run succeeded) / `unknown`. +5. **CCCL-specific flags.** Specific toolchain combo (useful for `cccl-ci-overrides`), cluster of related + failures, path naming a recently-introduced change. + +## Output + +``` +**Job:** `> +**Class:** code | infra | flaky | unknown + +**First real error** (log line ): + + +**Failing step:** + +**Diagnostic:** + <2-4 lines with file:line> + +**CCCL flags:** + - +``` + +≤10 lines of body text. diff --git a/.agent/skills/cccl-agent-impl/SKILL.md b/.agent/skills/cccl-agent-impl/SKILL.md new file mode 100644 index 00000000000..67d9aeb56c3 --- /dev/null +++ b/.agent/skills/cccl-agent-impl/SKILL.md @@ -0,0 +1,54 @@ +--- +name: cccl-agent-impl +description: "How skills and agents work in the CCCL repository. Filesystem layout, invocation, frontmatter, allow-list semantics, intent-driven auto-discovery. Load this skill when you land in the CCCL repo cold and don't know what skills or agents are, when you see references to `.agent/skills` or `.agent/agents` and want to understand them, or when authoring a new CCCL skill or agent." +--- + +# cccl-agent-impl + +## Filesystem + +``` +/.agent/ + skills//SKILL.md + agents/.md + +/.claude/ + skills -> ../.agent/skills (directory symlink) + agents -> ../.agent/agents (directory symlink) + settings.json +``` + +Canonical files live under `.agent/`. Claude Code reads `.claude/skills/` and `.claude/agents/`; Codex reads +`.agent/`. + +## Skills + +`.agent/skills//SKILL.md`. Frontmatter: + +```yaml +--- +name: +description: "" +--- +``` + +Invoke via the **Skill tool** with `skill: `. Not reentrant. + +## Agents + +`.agent/agents/.md`. Frontmatter: + +```yaml +--- +name: +description: "" +model: haiku +tools: Read, Grep, Bash +--- +``` + +CCCL agents are **non-interactive** — no `AskUserQuestion`. User dialogue belongs in the calling skill (often via +`cccl-clarify`). Pick `model:` per workload: `haiku` for mechanical tasks (log parsing, jq munging, SHA +verification); `sonnet` for multi-file reasoning or judgment (e.g. `cccl-ci-overrides`). + +Dispatch via the **Agent tool** with `subagent_type: `. The agent runs to completion and returns one message. diff --git a/.agent/skills/cccl-bisect/SKILL.md b/.agent/skills/cccl-bisect/SKILL.md new file mode 100644 index 00000000000..4da48132993 --- /dev/null +++ b/.agent/skills/cccl-bisect/SKILL.md @@ -0,0 +1,73 @@ +--- +name: cccl-bisect +description: "Run a git bisect on CCCL to identify which commit introduced a regression. Two routes: cloud (dispatch `.github/workflows/git-bisect.yml` via `gh workflow run`, runs in CCCL CI infrastructure on a GPU runner) or local (invoke `ci/util/git_bisect.sh` via `.devcontainer/launch.sh`). Walks the user through preset / build-targets / ctest-targets / lit-tests / good-ref / bad-ref selection. Use when the user has a regression and wants to find the introducing commit. Trigger phrases: \"bisect this regression\", \"find when X broke\", \"git bisect\"." +--- + +# cccl-bisect + +Bisects are slow. Restrict build/test targets to the smallest set that reliably reproduces the regression. + +## Sources of truth + +- `.github/workflows/git-bisect.yml` — cloud-dispatch workflow. +- `ci/util/git_bisect.sh` — local script wrapped by the workflow. +- `ci/util/build_and_test_targets.sh` — per-commit configure/build/test driver. +- `docs/cccl/development/build_and_bisect_tools.rst` — full docs. + +## Inputs needed + +- **`preset`** — CMake preset (e.g. `cub-cpp20`, `thrust-cpp17`, `libcudacxx`, `cudax`). `cmake --list-presets` + enumerates them. +- **`build_targets`** — space-separated ninja targets. +- **`ctest_targets`** — space-separated CTest `-R` regexes. Optional. +- **`lit_precompile_tests` / `lit_tests`** — space-separated libcudacxx lit paths relative to + `libcudacxx/test/libcudacxx/`. Optional. +- **`good_ref`** / **`bad_ref`** — commit/tag/branch, or `-Nd` ("N days ago on main", e.g. `-7d`), or empty + (defaults: latest release tag / `main`). +- **`cmake_options`** — extra `-D…=…` flags. Optional. +- **`launch_args`** — extra `--cuda X` / `--host Y` for devcontainer. Optional. + +Route ambiguous inputs through `cccl-clarify`. + +## Route 1 — cloud dispatch + +``` +gh workflow run git-bisect.yml --repo NVIDIA/cccl --ref \ + -f runner='' \ + -f preset='' \ + -f build_targets='' \ + -f ctest_targets='' \ + -f good_ref='' \ + -f bad_ref='' +``` + +Runner labels: + +- `linux-amd64-cpu16` — 16-core CPU box (build-only bisects). +- `linux-amd64-gpu-rtxa6000-latest-1` — RTX A6000, 1 GPU (test bisects). +- Others: see the workflow file inputs. + +Return the run URL. + +## Route 2 — local + +Requires Docker. + +``` +.devcontainer/launch.sh -d --gpus all \ + -- ./ci/util/git_bisect.sh \ + --summary-file /tmp/shared/summary.md \ + --good-ref '' \ + --bad-ref '' \ + --preset '' \ + --build-targets '' \ + --ctest-targets '' +``` + +Single long Bash invocation — no `&&` chains. + +## Output + +Both routes write a `summary.md` capturing the found-bad commit (hash, author, message), the build/test command +that distinguishes good from bad, and the bisect log. Cloud route surfaces a "Bisection Results" URL in the GHA +step summary. diff --git a/.agent/skills/cccl-build-and-test-targets/SKILL.md b/.agent/skills/cccl-build-and-test-targets/SKILL.md new file mode 100644 index 00000000000..13a32e58c90 --- /dev/null +++ b/.agent/skills/cccl-build-and-test-targets/SKILL.md @@ -0,0 +1,73 @@ +--- +name: cccl-build-and-test-targets +description: "Reference for `ci/util/build_and_test_targets.sh` — CCCL's preset-driven configure/build/test driver used by CI, the bisect workflow, and ad-hoc local runs. Covers `--preset`, `--cmake-options`, `--configure-override`, `--build-targets`, `--ctest-targets`, `--lit-precompile-tests`, `--lit-tests`, `--custom-test-cmd`. Use when the user wants to build or test a specific target without running the full CI matrix. Trigger phrases: \"build just X\", \"run test Y\", \"targeted build\", \"how do I run the cub tests\"." +--- + +# cccl-build-and-test-targets + +`ci/util/build_and_test_targets.sh` configures, builds, and tests a CMake preset with the targets you specify. +Run it from the repo root, inside the devcontainer (or anywhere the preset's compiler is available). + +## Flags + +| Flag | Effect | +|------------------------------------|-------------------------------------------------------------------------------------------------| +| `--preset ` | CMake preset (or use `--configure-override` instead) | +| `--cmake-options ""` | Extra `-D…=…` flags appended to preset configure | +| `--configure-override ""` | Custom configure command (overrides `--preset` and `--cmake-options`) | +| `--build-targets ""` | Space-separated ninja targets. Omit to skip build (`"all"` for everything) | +| `--ctest-targets ""` | Space-separated CTest `-R` regexes. Omit to skip tests (`"."` for all) | +| `--lit-precompile-tests ""` | libcudacxx lit paths to compile without execution (relative to `libcudacxx/test/libcudacxx/`) | +| `--lit-tests ""` | libcudacxx lit paths to compile AND execute | +| `--custom-test-cmd ""` | Arbitrary command after tests | + +`--build-targets` and `--ctest-targets` are opt-in. Omit → nothing builds or tests; the script just configures. + +## Common patterns + +Most cases: pick the preset and pass the target as both `--build-targets` and `--ctest-targets`: + +``` +ci/util/build_and_test_targets.sh \ + --preset \ + --build-targets "" \ + --ctest-targets "" +``` + +| Project | Preset(s) | Target example | +|------------|----------------------------------|-------------------------------| +| CUB | `cub-cpp17`, `cub-cpp20` | `cub.cpp20.test.iterator` | +| Thrust | `thrust-cpp17`, `thrust-cpp20` | `thrust.cpp20.test.reduce` | +| cudax | `cudax` | `cudax.cpp20.test.async_buffer` | +| C Parallel | `cccl-c-parallel` | `cccl.c.test.reduce` | + +libcudacxx is lit-driven — use `--lit-precompile-tests` and `--lit-tests` instead of `--build-targets`: + +``` +ci/util/build_and_test_targets.sh \ + --preset libcudacxx \ + --lit-precompile-tests "std/algorithms/alg.nonmodifying/alg.any_of/any_of.pass.cpp" \ + --lit-tests "std/algorithms/alg.nonmodifying/alg.any_of/any_of.pass.cpp" +``` + +Avoid `--build-targets "libcudacxx.cpp20.precompile.lit"` — it precompiles the entire test suite. + +## Output + +Build dir at `build/${CCCL_BUILD_INFIX}/${PRESET}/` (parsed from the cmake configure log line +`-- Build files have been written to:`). Phase-by-phase elapsed time printed with emoji status markers. + +## Wrapping in the devcontainer + +``` +.devcontainer/launch.sh -d --cuda 13.2 --host gcc14 -- \ + ./ci/util/build_and_test_targets.sh \ + --preset cub-cpp20 \ + --build-targets "cub.cpp20.test.iterator" +``` + +## vs full-matrix scripts + +- `build_and_test_targets.sh` — single preset, named targets. Fast iteration. +- `./ci/build_.sh` / `./ci/test_.sh` — full build/test cycles across host/std/arch matrix. Slow. + See `cccl-cpp-builds`. diff --git a/.agent/skills/cccl-ci-benchmarks/SKILL.md b/.agent/skills/cccl-ci-benchmarks/SKILL.md new file mode 100644 index 00000000000..956eccc8798 --- /dev/null +++ b/.agent/skills/cccl-ci-benchmarks/SKILL.md @@ -0,0 +1,55 @@ +--- +name: cccl-ci-benchmarks +description: "Request CCCL benchmark runs in PR CI by editing `ci/bench.yaml`, or launch benchmark workflows directly via `gh workflow run`. Walks the user through filter selection (CUB ninja-target regex / Python path regex), GPU selection, and the `[bench-only]` commit-tag convention. Use when the user wants to benchmark a change on PR CI, or trigger a one-off benchmark workflow. Trigger phrases: \"benchmark this PR\", \"request a perf run\", \"compare benchmarks before/after\"." +--- + +# cccl-ci-benchmarks + +Two routes: PR-driven (edit `ci/bench.yaml`, push) and direct dispatch (`gh workflow run`). + +`ci/bench.yaml` holds the request; `ci/bench.template.yaml` is the empty template CI checks against. Both must +match to merge. + +## Route 1 — PR-driven + +1. **Edit `ci/bench.yaml`:** + - Add CUB benchmark regexes under `benchmarks.filters.cub` (matched against ninja target names, e.g. + `^cub\.bench\.for_each\.base`). + - Add Python benchmark path regexes under `benchmarks.filters.python` (matched against paths under + `benchmarks/`, e.g. `compute/reduce/sum\.py`). + - Uncomment at least one GPU under `benchmarks.gpus`: `t4`, `rtx2080`, `rtxa6000`, `l4`, `rtx4090`, `h100`, + `rtxpro6000`. Pools are shared — pick conservatively. + - Optionally adjust `launch_args` (e.g. `"--cuda 13.2 --host gcc14"`). + +2. **Append `[bench-only]`** to the commit message — skips non-benchmark CI (equivalent to + `[skip-matrix][skip-vdc][skip-docs][skip-tpt]`). + +3. **Push.** Inspect dispatched jobs via `gh run view `. + +4. **Reset before final merge.** Restore `ci/bench.yaml` to match `ci/bench.template.yaml` (empty filters, no GPUs + uncommented). + +## Route 2 — direct dispatch + +If a benchmark workflow exists for direct dispatch (`gh workflow list --repo NVIDIA/cccl`): + +``` +gh workflow run .yml --repo NVIDIA/cccl --ref -f = +``` + +Return the run URL. `gh workflow run` is mutating; prompts every use. + +## Defaults + +From `ci/bench.yaml`'s `Advanced` block: + +- `base_ref: "origin/main"` — what to compare against. +- `test_ref: "HEAD"` — what to test. +- `arch: "native"` — usually fine; can be a list like `"80;90"`. +- `nvbench_args` — preset with timeout / skip-time / stopping criterion / throttle handling. + +## Pitfalls + +- Forgetting to uncomment a GPU → no jobs run. +- Forgetting `[bench-only]` → wasteful full-CI run alongside. +- Not resetting `ci/bench.yaml` before merge → merge blocked. diff --git a/.agent/skills/cccl-ci/SKILL.md b/.agent/skills/cccl-ci/SKILL.md new file mode 100644 index 00000000000..4fbfd5faa5e --- /dev/null +++ b/.agent/skills/cccl-ci/SKILL.md @@ -0,0 +1,54 @@ +--- +name: cccl-ci +description: "Orientation for CCCL's GitHub Actions CI. Pointers to the sources of truth (`ci/matrix.yaml`, `ci-overview.md`, workflow files) and a map of the moving parts. Use when the user asks how CI works here, where a CI behavior is defined, why a job ran or didn't, or what `[skip-*]` tags exist. Trigger phrases: \"how does CI work\", \"where is X CI defined\", \"why did this job run\", \"explain the matrix\". For TRIAGING a CI failure, use `cccl-triage-pr` or `cccl-triage-nightly` instead." +--- + +# cccl-ci + +## Sources of truth + +| Topic | File | +|-----------------------------------------------|-------------------------------------------------------------------| +| Job matrix (PR / nightly / weekly + override) | `ci/matrix.yaml` | +| Skip tags, override rules, troubleshooting | `ci-overview.md` | +| Workflow entry points | `.github/workflows/ci-workflow-{pull-request,nightly,weekly}.yml` | +| `/ok to test` policy + trustees | `.github/copy-pr-bot.yaml`, `CONTRIBUTING.md` § CI | +| Per-job runner setup | `.github/actions/workflow-run-job-{linux,windows}/` | +| Matrix expansion → dispatchable jobs | `.github/actions/workflow-build/` running `build-workflow.py` | +| Job pruning by changed paths | `ci/inspect_changes.py` | +| Result aggregation | `.github/actions/workflow-results/` | +| Bench-request config | `ci/bench.yaml` | +| Git-bisect cloud dispatch | `.github/workflows/git-bisect.yml` | + +## PR run flow + +`ci-workflow-pull-request.yml` → `build-workflow.py` reads `ci/matrix.yaml`. Non-empty `workflows.override` wins; +otherwise `inspect_changes.py` prunes by dirty projects from changed paths. Jobs run through +`workflow-run-job-{linux,windows}/` in a devcontainer. `workflow-results/` aggregates; marks failed if any job +failed OR if override is non-empty. + +## Scoping a PR's CI (both block merging) + +- **`[skip-*]` tags** on the last commit. Tokens in `ci-overview.md`. +- **`workflows.override` in `ci/matrix.yaml`** — replaces the `pull_request` matrix with a targeted subset: + + ```yaml + workflows: + override: + - {jobs: ['build'], project: 'cudax', ctk: '12.0', std: 'all', cxx: ['msvc14.39', 'gcc10', 'clang14']} + ``` + +`cccl-ci-overrides` generates both from failed-job names and/or changed-path lists. + +## `/ok to test` policy + +Draft PRs need `/ok to test ` from a maintainer to start CI. Route all such requests through the +`cccl-ok-to-test` agent (SHA-gated). + +## Gotchas + +- Non-empty `workflows.override` blocks merge. Reset to empty before final merge (don't remove the key). +- Any `[skip-*]` tag blocks merge. +- `ci/bench.yaml` must match `ci/bench.template.yaml` to merge. +- `gh pr view --json statusCheckRollup` returns 100k+ tokens for 500-job PRs. Use `gh pr checks`. +- `gh run view --log-failed` errors mid-run. Use `gh api repos/NVIDIA/cccl/actions/jobs//logs`. diff --git a/.agent/skills/cccl-clarify/SKILL.md b/.agent/skills/cccl-clarify/SKILL.md new file mode 100644 index 00000000000..8a6b25ee1c7 --- /dev/null +++ b/.agent/skills/cccl-clarify/SKILL.md @@ -0,0 +1,43 @@ +--- +name: cccl-clarify +description: "Decision-point escalation. Use when you cannot resolve a question through default reasoning — tricky tradeoffs, scarce evidence, ambiguous user intent, or a fork in the road that needs human judgment. Triggered by phrases like \"I'm stuck\", \"not sure how to proceed\", \"should I X or Y\", \"help me decide\". Also invoked by other cccl-* skills when they need to surface a question to the user. Walks the three-step escalation (default reasoning → self-research → ask the user) and the \"how to ask well\" rules — print context in chat, AskUserQuestion with breakdown branch, point-by-point dialogue." +--- + +# cccl-clarify + +## Escalation ladder + +Stop at the first level that produces a confident answer. + +1. **Default reasoning** — resolve from existing context: prompt, conversation, files read, `AGENTS.md`, `cccl` + skill, memory. Escalate if the tradeoffs are balanced, evidence is thin, the decision is hard to reverse, or + intent is genuinely ambiguous. +2. **Self-research** — cheapest source first: code, memory, in-repo docs (`AGENTS.md`, `CONTRIBUTING.md`, + `ci-overview.md`), upstream library docs, web, Explore subagent. Time-box. Two or three rounds without + confidence moving = escalate. +3. **Ask the user** — when research won't close the gap. + +## How to ask well + +1. **Print context in chat.** Tool output isn't visible to the user. Frame the decision, what was tried, the + tradeoff axis — in your text, not just in the question prompt. +2. **`AskUserQuestion` correctly.** 2–4 mutually-exclusive options (or `multiSelect`). Lead with the recommendation + and suffix `(Recommended)` when evidence favours it. Each option's `description` carries the substance. Don't + add "Other" — UI handles it. +3. **Offer a breakdown branch** for non-trivial questions — a "walk me through it" option that lets the user defer + the pick. +4. **Breakdown flow.** Offer further research (multi-select with "None — overview"). Then a 200–400 word overview: + problem, ordered decision points, tradeoffs, what's already decided. Walk point-by-point — dependent questions + sequential, not parallel. Confirm the chosen path end-to-end before acting. + +## When NOT to invoke + +- Single-line obvious fixes. +- Conversational questions — answer them. +- Decisions whose default is so obvious that asking is noise. +- Questions answered in `AGENTS.md`, the `cccl` skill, or memory. + +## Hard prohibitions + +- Never invoke recursively. +- Never use to defer a decision the user already made. diff --git a/.agent/skills/cccl-commit/SKILL.md b/.agent/skills/cccl-commit/SKILL.md new file mode 100644 index 00000000000..fa39252d9ce --- /dev/null +++ b/.agent/skills/cccl-commit/SKILL.md @@ -0,0 +1,120 @@ +--- +name: cccl-commit +description: "Walk uncommitted changes in a CCCL worktree through an interactive review-and-stage flow: survey the diff, optionally split into multiple commit groups, walk chunks one at a time with diff rendering and an action menu (stage / edit / defer / revert), optionally run a test gate, draft commit message(s), confirm, and commit. Use when committing uncommitted changes, preparing a branch for push, or wrapping up a fix. Trigger phrases: \"commit these changes\", \"wrap this up\", \"ready to commit\", \"stage and commit\", \"prepare commits\", \"split into commits\". For PR creation or `/ok to test`, route to `cccl-pr` after committing." +--- + +# cccl-commit + +Interactive commit prep. Route every user-facing question through `cccl-clarify`. Refuses on `main`. Scratch dir: +`mkdir -p /tmp/claude/`. + +## Step 1 — Component selection + +`AskUserQuestion`, `multiSelect: true`: + +- **Split** — group hunks into multiple commits. +- **Interactive** — walk each chunk with a diff render + action menu. +- **Test gate** — run `pre-commit` and a build/test target before committing. +- **Commit** — write messages and execute. Without this, nothing commits. + +Commit-only with no Split / no Interactive → fast path: commit whatever is staged (Step 5). + +## Step 2 — Survey + +Single Bash each: + +- `git status -sb` +- `git diff > /tmp/claude//diff-unstaged.txt` (if > 2k lines) +- `git diff --cached > /tmp/claude//diff-staged.txt` (same threshold) +- `git log --oneline -10` + +## Step 3 — Plan (if Split or Interactive) + +`git diff > /tmp/claude//patch.txt` (or `git diff HEAD` for combined). + +Plan into commit groups CC-NN (one group if Split not selected). Within each group, slice into chunks; write each +slice to `/tmp/claude//chunks/CC-NN.patch`. Coverage check: sum-of-slice-hunks == total-hunks. Run +`git apply --check chunks/CC-NN.patch` on every slice. + +Present plan summary (groups, chunks/group, total lines). `cccl-clarify` → approve / reorder / discuss. + +## Step 4 — Walk chunks (if Interactive) + +For each chunk in planned order: + +1. Read `chunks/CC-NN.patch`. +2. Render the diff verbatim in chat as a ` ```diff ` fenced block, per-hunk headers naming file:line range. + Never use Bash output for diffs. Pattern dedup is fine for repetition — show pattern once, list other + occurrences and locations. +3. Suggest improvements (numbered, with file:line refs) or note "No suggested changes". +4. `AskUserQuestion`: + - **Stage as-is** — `git apply --cached chunks/CC-NN.patch`. Verify with `git diff --cached --stat`; STOP if + the staged file list doesn't match the expected set. + - **Apply suggested edits, re-review** — `Edit`, regenerate diff with `git diff -- `, loop. + - **Apply custom edits, re-review** — user describes, `Edit`, loop. + - **Leave unstaged** — defer. + - **Revert** — `git apply -R chunks/CC-NN.patch` (or `git checkout -- ` for whole-file). + - **Discuss** — open conversation; loop. + +Track: current group, staged/deferred/reverted chunks. + +Split selected, Interactive not → auto-stage each slice in order. Verify the staged set grows monotonically into +the per-group expected set. STOP on divergence. + +## Step 5 — Test gate (if selected) + commit + +### 5.0 Fast path + +Commit-only with no Split / no Interactive: confirm staged set via `git diff --cached --stat` (empty → exit), +skip the test gate unless asked, go to 5.2. + +### 5.1 Tests + +`cccl-clarify` → skip / `pre-commit run --files ` / dispatch `cccl-build-and-test-targets`. On failure: +investigate / commit anyway / abort. + +### 5.2 Commit message + +`cccl-clarify` for detail tier — **Trivial** (subject only) / **Standard** (subject + 1–6 body lines) / +**Detailed** (subject + multi-paragraph). + +Rules: +- Subject ≤ 72 chars, imperative, no trailing period. +- Match CCCL's prefix convention from `git log --oneline -20`. +- Body wraps ~72 chars. +- No co-author / tool-attribution footers. +- `[skip-*]` tags apply to a single push and must appear on the LAST commit's last line only. + +Draft. `cccl-clarify` → use / revise / cancel. + +### 5.3 Commit + +Write final message to `/tmp/claude//commit-msg-CC.txt`. Then `git commit -F ` (mutating; expect +prompt). Verify with `git show -p HEAD`: SHA, subject, file list match expectations. + +## Step 6 — Inter-group transition (if Split) + +After each commit, `cccl-clarify` → continue / pause / end. On continue, verify remaining slices still apply +(`git apply --check` per remaining slice); regenerate the patch and re-plan if any fail. + +Remind caller to use `cccl-ci-overrides` to setup a minimal CI run if needed. + +Last group → final summary (all SHAs, deferred, reverted) and exit. + +## Hard prohibitions + +Unless explicitly approved by the user in `cccl-clarify` at the moment of action, never do any of the following: + +- Never edit on `main`. +- Never `--no-verify`. +- Never `--amend` a published commit. +- Never co-author / tool-attribution footers. + +In any circumstance: + +- Never fabricate diff content — every line shown comes from the patch or `git diff`. +- Never `git add` without explicit per-chunk user approval. + +## Handoff + +After commits land: route to `cccl-pr` for push / open / update / `/ok to test`. diff --git a/.agent/skills/cccl-cpp-builds/SKILL.md b/.agent/skills/cccl-cpp-builds/SKILL.md new file mode 100644 index 00000000000..348244c5272 --- /dev/null +++ b/.agent/skills/cccl-cpp-builds/SKILL.md @@ -0,0 +1,53 @@ +--- +name: cccl-cpp-builds +description: "Build and test CCCL's C++ libraries (libcudacxx, CUB, Thrust, cudax, C Parallel) — per-project `ci/build_*.sh` and `ci/test_*.sh` full-matrix scripts, architecture conventions, and pointers to the targeted-build alternative. Use when the user wants to build or test a CCCL C++ library across a full host/std/arch matrix, or asks about architecture flag syntax. Trigger phrases: \"build cub\", \"test libcudacxx\", \"build thrust\", \"full matrix build\", \"compile cudax\", \"cuda architectures\". For SINGLE-target fast iteration use `cccl-build-and-test-targets` instead." +--- + +# cccl-cpp-builds + +Per-project full-matrix build + test scripts under `ci/`. Flags: host compiler, C++ standard, GPU architectures. + +Full builds: 60+ min build, 30+ min test — never cancel. For single targets, use `cccl-build-and-test-targets`. + +## Scripts + +``` +./ci/build_.sh [-cxx ] [-std ] [-arch ""] # no GPU +./ci/test_.sh -cxx -std -arch "" # GPU required +``` + +| Project | Build / test scripts | Stds | +|-------------------|-----------------------------|-----------| +| CUB | `build_cub`, `test_cub` | 17, 20 | +| Thrust | `build_thrust`, `test_thrust` | 17, 20 | +| libcudacxx | `build_libcudacxx`, `test_libcudacxx` | 17, 20 | +| cudax | `build_cudax`, `test_cudax` | 20 only | +| C Parallel | `build_cccl_c_parallel` | 17 only | + +Test scripts build implicitly if the tree is missing. CTest preset form (e.g. `ctest --preset=cub-cpp17`) also +works. + +Compute-sanitizer variants: append `-compute-sanitizer-{memcheck,racecheck,initcheck,synccheck}`. Not all +projects support all tools — check `--help`. + +## Flags + +- **`-cxx`** — host compiler (`g++`, `clang++`, `msvc14.39`). +- **`-std`** — C++ standard (`17` or `20`, subject to project limits above). +- **`-arch`** — semicolon-separated CUDA architecture list (CMake `CUDA_ARCHITECTURES`): + + | Form | Generates | + |------------------|-----------------------| + | `` | PTX + SASS for SM XX | + | `` | SASS only | + | `` | PTX only | + | `native` | Detect host GPU | + | `all-major-cccl` | Default for PR builds | + + Examples: `"native"`, `"80"`, `"70;75;80-virtual"`. + +## Performance + +- `sccache` is enabled in the devcontainer (CCCL-team bucket auth). +- Limit `-arch` — `"native"` or `"80"` is much faster than `"all-major-cccl"`. +- Build scripts already parallelize via ninja. diff --git a/.agent/skills/cccl-devcontainers/SKILL.md b/.agent/skills/cccl-devcontainers/SKILL.md new file mode 100644 index 00000000000..4920e18417c --- /dev/null +++ b/.agent/skills/cccl-devcontainers/SKILL.md @@ -0,0 +1,58 @@ +--- +name: cccl-devcontainers +description: "Use CCCL's `.devcontainer/launch.sh` to run one-off bash sessions, builds, or tests inside a CCCL-configured container with a chosen CUDA toolkit and host compiler. Covers the `-d` / `--cuda` / `--host` / `--gpus` / `--env` / `--volume` argument conventions and the `CCCL_BUILD_INFIX` already-in-container check. Use when the user wants to build/test in a clean, reproducible environment, run a quick experiment with a specific toolchain, or escape from host environment problems. Trigger phrases: \"run in devcontainer\", \"launch the container\", \"build with cuda 13.2\", \"open a shell with gcc 14\"." +--- + +# cccl-devcontainers + +`.devcontainer/launch.sh` boots a Docker container preconfigured with a chosen CUDA toolkit and host compiler, +mounts the repo, and either drops into a shell or runs a script. **Linux-only** — Linux host, Linux container. +Windows / MSVC builds run outside the devcontainer. + +## Flags + +| Flag | Purpose | +|--------------------------|------------------------------------------| +| `-d`, `--docker` | Run without VSCode (required for agents) | +| `--cuda ` | CUDA toolkit (e.g. `13.2`, `12.9`) | +| `--cuda-ext` | Image with extended CTK libraries | +| `--host ` | Host compiler (`gcc14`, `clang17`) | +| `--gpus ` | GPU passthrough (`all` for everything) | +| `-e`, `--env KEY=VAL` | Inject env var | +| `-v`, `--volume SRC:DST` | Mount additional path | +| `--