Skip to content

[https://nvbugs/6193854][fix] PR #14851 already removed the bad is_sliding_window/mMaxSeqLenKv logic on…#15321

Open
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6193854-eeacda
Open

[https://nvbugs/6193854][fix] PR #14851 already removed the bad is_sliding_window/mMaxSeqLenKv logic on…#15321
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6193854-eeacda

Conversation

@tensorrt-cicd

@tensorrt-cicd tensorrt-cicd commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Root cause: PR [None][perf] Drop cubin and Eliminate ~6s FMHA JIT recompile in eager generation by aligning kernel selection with CUDA graph warmup #13505 (ebf19a49) introduced an is_sliding_window heuristic in convertMMHAParamsToXQAParams that flipped the XQA dispatch path to SlidingOrChunkedCausal for any RoPE model whose max_attention_window_size < mRotaryEmbeddingMaxPositions — true for the Qwen3-235B-A22B-FP8 benchmark (window 3000 < RoPE 40960). Combined with the mMaxSeqLenKv pinning, this caused JIT-kernel-bucket churn during warmup/ramp/tail (steady-state was unchanged at ~0.7%, far below the 16.16% bench gap), since warmup and runtime no longer hit the same kernel bucket.
  • Fix: PR [https://nvbugs/6185446][fix] Add warmup for trtllm-gen fmha JIT kernels #14851 already removed the bad is_sliding_window/mMaxSeqLenKv logic on origin/main and replaced it with an explicit FMHA JIT warmup grid (runJITWarmupGridIfRequested in fmhaKernels.h). On top of that, I add intermediate seq-len candidates 256 and 3072 to kDefaultWarmupSeqLenQkvCandidates so the warmup grid exactly covers the typical decode-window sizes for benchmarks like Qwen3-235B (max_attention_window=3000, decode kv-len ramp 1k→3k) — closing the largest relative gaps (128→512 and 2048→4096) without disabling any optimization or modifying any test infrastructure.
  • Perf metric: total_token_throughput (higher-is-better)
  • Bad perf: 1.855e+04
  • Good perf: 2.207e+04
  • Perf after fix: 2.157e+04
  • Automated fix generated by repair-bot

Test plan

  • Verify fix on the same GPU type as the original failure
  • Check for regressions in related tests

Links

Summary by CodeRabbit

  • Chores
    • Enhanced JIT compilation warmup sequence coverage to improve initialization performance across various model configurations.

…`/`mMaxSeqLenKv` logic on `o

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db5e11f0-8f13-410b-95a7-5da2629f9c9c

📥 Commits

Reviewing files that changed from the base of the PR and between b03b78f and cdaed2e.

📒 Files selected for processing (1)
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h

📝 Walkthrough

Walkthrough

This PR updates the default candidate sequence-length list (kDefaultWarmupSeqLenQkvCandidates) used for FMHA kernel JIT warmup compilation, adding smaller sizes (256, 3072) while retaining existing larger candidates to expand kernel compilation coverage.

Changes

FMHA Warmup Sequence Length Candidates

Layer / File(s) Summary
Default warmup sequence-length candidates
cpp/tensorrt_llm/kernels/trtllmGenKernels/fmha/fmhaKernels.h
kDefaultWarmupSeqLenQkvCandidates is updated to include additional sequence-length sizes (256, 3072) alongside existing larger candidates (8192, 16384, 32768) for improved JIT warmup kernel compilation coverage.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

  • yunruis
  • chzblych
  • yuxianq
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title is partially related to the changeset. It references PR #14851 and the removal of is_sliding_window/mMaxSeqLenKv logic, but appears truncated/incomplete and doesn't clearly describe the main change (adding seq-len candidates to FMHA warmup). Complete and clarify the title to describe the primary change clearly, e.g., 'Add 256 and 3072 seq-len candidates to FMHA JIT warmup grid' or continue the truncated text to convey the full intent.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description provides comprehensive context including root cause, fix details, performance metrics, and test plan, but does not follow the required template structure with explicit Description, Test Coverage, and PR Checklist sections.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants