[TRTLLM-12339][feat] enable TRTLLM cross attention backend by cascade812 · Pull Request #15345 · NVIDIA/TensorRT-LLM

cascade812 · 2026-06-14T04:02:10Z

Description

Split out the attention operator and TRTLLM attention backend changes from #13919 to reduce frequent conflicts with main and make CI validation easier for this smaller, self-contained scope.

This PR intentionally keeps the change self-contained:

wires thop.attention and its nanobind signature for cross-attention and relative-attention-bias inputs
enables the TRTLLM backend path for cross-attention metadata, including Q padding, cross K/V forwarding, and beam-width handling
makes trtllm-gen decline cross-attention so cross requests use the THOP path
adds only the small backend forward-args fields required by the TRTLLM backend

No module, executor, model, or LLM API caller changes are included in this split.

Signed-off-by: Guiju Zhang <guijuz@nvidia.com>

cascade812 · 2026-06-14T04:04:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-14T04:11:03Z

PR_Github #54082 [ run ] triggered by Bot. Commit: 46ef1af Link to invocation

tensorrt-cicd · 2026-06-14T08:39:55Z

PR_Github #54082 [ run ] completed with state SUCCESS. Commit: 46ef1af
/LLM/main/L0_MergeRequest_PR pipeline #43166 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned cascade812 Jun 14, 2026

[TRTLLM-12339][feat] enable TRTLLM cross attention backend

46ef1af

Signed-off-by: Guiju Zhang <guijuz@nvidia.com>

cascade812 force-pushed the codex/split-attention-op-trtllm branch from 7537f51 to 46ef1af Compare June 14, 2026 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-12339][feat] enable TRTLLM cross attention backend#15345

[TRTLLM-12339][feat] enable TRTLLM cross attention backend#15345
cascade812 wants to merge 1 commit into
NVIDIA:mainfrom
cascade812:codex/split-attention-op-trtllm

cascade812 commented Jun 14, 2026 •

edited

Loading

Uh oh!

cascade812 commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cascade812 commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

cascade812 commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

tensorrt-cicd commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cascade812 commented Jun 14, 2026 •

edited

Loading