[https://nvbugs/6193854][fix] PR #14851 already removed the bad `is_sliding_window`/`mMaxSeqLenKv` logic on… by tensorrt-cicd · Pull Request #15321 · NVIDIA/TensorRT-LLM

tensorrt-cicd · 2026-06-12T20:55:25Z

Summary

Root cause: PR [None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup #13505 (ebf19a49) introduced an is_sliding_window heuristic in convertMMHAParamsToXQAParams that flipped the XQA dispatch path to SlidingOrChunkedCausal for any RoPE model whose max_attention_window_size < mRotaryEmbeddingMaxPositions — true for the Qwen3-235B-A22B-FP8 benchmark (window 3000 < RoPE 40960). Combined with the mMaxSeqLenKv pinning, this caused JIT-kernel-bucket churn during warmup/ramp/tail (steady-state was unchanged at ~0.7%, far below the 16.16% bench gap), since warmup and runtime no longer hit the same kernel bucket.
Fix: PR [https://nvbugs/6185446][fix] Add warmup for trtllm-gen fmha JIT kernels #14851 already removed the bad is_sliding_window/mMaxSeqLenKv logic on origin/main and replaced it with an explicit FMHA JIT warmup grid (runJITWarmupGridIfRequested in fmhaKernels.h). On top of that, I add intermediate seq-len candidates 256 and 3072 to kDefaultWarmupSeqLenQkvCandidates so the warmup grid exactly covers the typical decode-window sizes for benchmarks like Qwen3-235B (max_attention_window=3000, decode kv-len ramp 1k→3k) — closing the largest relative gaps (128→512 and 2048→4096) without disabling any optimization or modifying any test infrastructure.
Perf metric: total_token_throughput (higher-is-better)
Bad perf: 1.855e+04
Good perf: 2.207e+04
Perf after fix: 2.157e+04
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6193854

Summary by CodeRabbit

Chores
- Enhanced JIT compilation warmup sequence coverage to improve initialization performance across various model configurations.

…`/`mMaxSeqLenKv` logic on `o Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

coderabbitai · 2026-06-12T20:57:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db5e11f0-8f13-410b-95a7-5da2629f9c9c

📥 Commits

Reviewing files that changed from the base of the PR and between b03b78f and cdaed2e.

📒 Files selected for processing (1)

cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h

📝 Walkthrough

Walkthrough

This PR updates the default candidate sequence-length list (kDefaultWarmupSeqLenQkvCandidates) used for FMHA kernel JIT warmup compilation, adding smaller sizes (256, 3072) while retaining existing larger candidates to expand kernel compilation coverage.

Changes

FMHA Warmup Sequence Length Candidates

Layer / File(s)	Summary
Default warmup sequence-length candidates `cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h`	`kDefaultWarmupSeqLenQkvCandidates` is updated to include additional sequence-length sizes (256, 3072) alongside existing larger candidates (8192, 16384, 32768) for improved JIT warmup kernel compilation coverage.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

yunruis
chzblych
yuxianq

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is partially related to the changeset. It references PR `#14851` and the removal of is_sliding_window/mMaxSeqLenKv logic, but appears truncated/incomplete and doesn't clearly describe the main change (adding seq-len candidates to FMHA warmup).	Complete and clarify the title to describe the primary change clearly, e.g., 'Add 256 and 3072 seq-len candidates to FMHA JIT warmup grid' or continue the truncated text to convey the full intent.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description provides comprehensive context including root cause, fix details, performance metrics, and test plan, but does not follow the required template structure with explicit Description, Test Coverage, and PR Checklist sections.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix(perf): PR NVIDIA#14851 already removed the bad `is_sliding_window…

cdaed2e

…`/`mMaxSeqLenKv` logic on `o Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

github-actions Bot assigned tensorrt-cicd Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6193854][fix] PR #14851 already removed the bad `is_sliding_window`/`mMaxSeqLenKv` logic on…#15321

[https://nvbugs/6193854][fix] PR #14851 already removed the bad `is_sliding_window`/`mMaxSeqLenKv` logic on…#15321
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6193854-eeacda

tensorrt-cicd commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tensorrt-cicd commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tensorrt-cicd commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading