Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,6 @@ nsys_analysis_build/
**/NeMo

# AI agent files
CLAUDE.local.md
GEMINI.md
plans/
37 changes: 37 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,40 @@
## [v26.05.00] - 2026-05-19

### Added

- Kimi K2 MXFP8 pretrain support.
- Nemotron 3 Nano (30B) and Super (120B) pretrain recipes.
- Slurm topology checks and CPU governor reporting in the system info microbenchmark.
- `llmb-run` job history and log handling.
- `llmb-run` flags: `--env` for container env overrides, additional Slurm pass-through flags, and `dump-env` Megatron-Bridge mode.

### Changed

- Updated recipes to NeMo 26.04.00 where applicable.
- Refreshed DeepSeek V3, Nemotron 3, and Qwen3 configurations.

### Fixed

- Legacy-parser grad-norm NaN handling.
- Archive exclusion for `nsys_profile` and PyTorch profiling output directories.
- Torchtitan container compatibility.

### Removed

- Deprecated Grok1 and Nemotron4 recipes.
- Legacy `setup_script` installer path and Conda support.
- Deprecated `llmb-run` commands.

### Known Issues

- DeepSeek V3 Megatron-Bridge on H100 requires `uv <=0.9.28` during setup.
- EFA limitations remain for DeepSeek V3 (Megatron-Bridge H100, TorchTitan) and Qwen3 (30B H100, 235B H100); see Known Issues section of README for details.
- Optional PCT fixed-core CPU binding may improve select workloads on Granite Rapids systems where PCT is enabled. See the README Known Issues section before applying the patch.

### End of Support

- LLMB `v25.12.x` and earlier are no longer supported as of `v26.05.00`. These release lines will not receive further updates, fixes, or support.

## [v26.02.01] - 2026-04-24

### Added
Expand Down
72 changes: 37 additions & 35 deletions Exemplar_validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,13 @@ While the benchmarks can be run independently, we recommend looping in your NVID

### **Run benchmark recipes via llmb-run**

1. "llmb-run" is a tool that automates execution of the test suite, and is the recommended way to launch the suite.
2. For an installed GPU type, executing `llmb-run exemplar` will launch the full Exemplar test suite (including running each test three times). See the [llmb-run README](cli/llmb-run/README.md) for more info.
```bash
llmb-run exemplar
```

This launches the full Exemplar test suite for the installed GPU type. `llmb-run` is the recommended tool for executing the suite; the `exemplar` subcommand is a convenience that launches every required workload in one go.

If individual workloads fail, you can re-run them on their own — Exemplar requires a passing run for each workload in the suite, not a single end-to-end execution. See the [llmb-run README](cli/llmb-run/README.md) for repeat and profiling behavior.

### **Verify results**

Expand All @@ -44,12 +49,11 @@ While the benchmarks can be run independently, we recommend looping in your NVID

### **Optimize with NVIDIA**

1. Work with your NVIDIA account team to investigate any tuning opportunities with NVIDIA performance experts.
Work with your NVIDIA account team to investigate any tuning opportunities with NVIDIA performance experts.

### **Qualify for Exemplar**

1. If approved, your cloud is recognized as an [NVIDIA Exemplar Cloud](https://www.nvidia.com/en-us/data-center/ai-cloud-performance/) for the selected platform(s).
2. NVIDIA is happy to collaborate to support downstream efforts highlighting your achievement.
If approved, your cloud is recognized as an [NVIDIA Exemplar Cloud](https://www.nvidia.com/en-us/data-center/ai-cloud-performance/) for the selected platform(s). NVIDIA is happy to collaborate to support downstream efforts highlighting your achievement.

## **Ongoing Expectations**

Expand All @@ -60,56 +64,56 @@ To start, contact your NVIDIA account team and reference this DGX Cloud Benchmar

## Exemplar Workload Recipes

Scale: **512 GPUs** | Repeats: **3x** | Profiling: enabled for 1 of the 3 total runs
Scale: **512 GPUs** | Repeats: **1** | Profiling: **disabled**

### GB300

| Model | Size | Dtypes |
| :---------- | :--- | :--------- |
| DeepSeek-V3 | 671B | BF16, FP8 |
| GPT (OSS) | 120B | BF16 |
| Grok-1 | 314B | BF16, FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Nemotron-4 | 340B | BF16, FP8 |
| Qwen3 | 235B | BF16 |
| Model | Size | Dtypes |
| :---------- | :--- | :--------------- |
| DeepSeek-V3 | 671B | BF16, FP8, NVFP4 |
| GPT (OSS) | 120B | BF16 |
| Kimi-K2 | 1T | FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Nemotron 3 | 120B | BF16, FP8, NVFP4 |
| Qwen3 | 235B | BF16 |

### GB200

| Model | Size | Dtypes |
| :---------- | :--- | :--------------- |
| DeepSeek-V3 | 671B | BF16, FP8, NVFP4 |
| GPT (OSS) | 120B | BF16 |
| Kimi-K2 | 1T | FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Qwen3 | 235B | BF16 |

### B300

| Model | Size | Dtypes |
| :---------- | :--- | :--------- |
| DeepSeek-V3 | 671B | BF16, FP8 |
| DeepSeek-V3 | 671B | BF16 |
| GPT (OSS) | 120B | BF16 |
| Grok-1 | 314B | BF16, FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Nemotron-4 | 340B | BF16, FP8 |
| Nemotron 3 | 120B | BF16 |
| Qwen3 | 235B | BF16 |

### B300

| Model | Size | Dtypes |
| :---------- | :--- | :----- |
| DeepSeek-V3 | 671B | BF16 |
| GPT (OSS) | 120B | BF16 |
| Llama 3.1 | 405B | FP8 |
| Llama 3.1 | 70B | FP8 |
| Nemotron-H | 56B | FP8 |
| Qwen3 | 235B | BF16 |

### B200

| Model | Size | Dtypes |
| :---------- | :--- | :--------- |
| DeepSeek-V3 | 671B | BF16, FP8 |
| GPT (OSS) | 120B | BF16 |
| Grok-1 | 314B | BF16, FP8 |
| Kimi-K2 | 1T | FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Nemotron-4 | 340B | BF16, FP8 |
| Nemotron 3 | 120B | BF16, FP8 |
| Qwen3 | 235B | BF16 |

### H100
Expand All @@ -118,8 +122,6 @@ Scale: **512 GPUs** | Repeats: **3x** | Profiling: enabled for 1 of the 3 total
| :---------- | :--- | :-------- |
| DeepSeek-V3 | 671B | FP8 |
| GPT (OSS) | 120B | BF16 |
| Grok-1 | 314B | BF16, FP8 |
| Llama 3.1 | 70B | BF16, FP8 |
| Nemotron-H | 56B | FP8 |
| Nemotron-4 | 340B | BF16, FP8 |
| Qwen3 | 235B | BF16 |
Loading