-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Benchmark] Add iteration benchmark with server-side step stats, trac…
frontend
needs-rebase
performance
Performance-related issues
tpu
Related to Google TPUs
v1
[MoE] Introduce Fp8MoEState and class-based dispatch for DeepGemm
#37249
opened Mar 17, 2026 by
yzong-rh
Loading…
5 tasks
refactor: standardize kimi_linear and minimax_text_01 model weight loading to use AutoWeightsLoader
#37248
opened Mar 17, 2026 by
XLiu-2000
Loading…
[Model] Implement LoRA support for Qwen3ASRForConditionalGeneration
documentation
Improvements or additions to documentation
qwen
Related to Qwen models
#37247
opened Mar 17, 2026 by
petern48
Loading…
5 tasks
[Bugfix] dtype mismatch in ngram gpu propose
bug
Something isn't working
speculative-decoding
v1
#37246
opened Mar 17, 2026 by
PatchouliTIS
Loading…
3 of 5 tasks
[Bugfix] Fix incorrect int8 dtype cast for kv_c_normed in MLA prefill
bug
Something isn't working
#37245
opened Mar 17, 2026 by
jacob-crux
Loading…
3 of 5 tasks
[Perf] Support Flashinfer trtllm tinygemm_bf16 router gemm for GPT-OSS
gpt-oss
Related to GPT-OSS models
#37244
opened Mar 17, 2026 by
elvischenv
Loading…
5 tasks
[ROCm][CI] Refine gating tests
ci/build
rocm
Related to AMD ROCm
#37243
opened Mar 17, 2026 by
AndreasKaratzas
•
Draft
[Refactor] Relocate responses API tests
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#37241
opened Mar 17, 2026 by
sfeng33
Loading…
[Models][GDN] Prevent D2H sync in Related to Qwen models
v1
ChunkGatedDeltaRule
qwen
#37239
opened Mar 16, 2026 by
lgeiger
Loading…
[Model Runner V2] Spec decode rejection sampler greedy support
v1
#37238
opened Mar 16, 2026 by
TheEpicDolphin
Loading…
[Model Runner V2] Spec decode rejection sampler logprobs support
v1
#37237
opened Mar 16, 2026 by
TheEpicDolphin
Loading…
Fix ambiguous num_blocks for hybrid attn mamba
v1
#37236
opened Mar 16, 2026 by
collinmccarthy
Loading…
[Bugfix] Fix for builtins (forward fix of pytorch/177558)
bug
Something isn't working
#37234
opened Mar 16, 2026 by
Lucaskabela
•
Draft
5 tasks
[UX] Add flashinfer-cubin as CUDA default dep
ci/build
nvidia
#37233
opened Mar 16, 2026 by
mgoin
Loading…
5 tasks
[Bugfix] Expand quantization method support in perf metrics
bug
Something isn't working
v1
#37231
opened Mar 16, 2026 by
thillai-c
Loading…
3 of 5 tasks
[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__
#37230
opened Mar 16, 2026 by
AndreasKaratzas
Loading…
1 task done
Fix Qwen3.5-Next RMSNormGated Initialization Error on TPU
qwen
Related to Qwen models
#37229
opened Mar 16, 2026 by
jrplatin
Loading…
5 tasks
[ROCM][Bugfix] Use correct stride in cp_mha_gather_cache_kernel for hybrid model (#37228)
bug
Something isn't working
fb-exported
meta-exported
rocm
Related to AMD ROCm
v1
#37228
opened Mar 16, 2026 by
jennyyyyzhen
Loading…
[Perf] Use list.extend() over append loops in FlatLogprobs + minor hot-path cleanups
v1
#37227
opened Mar 16, 2026 by
vaibhavhariram
Loading…
[Perf] Optimize top-k search in apply_top_k_top_p_triton sampler
performance
Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#37225
opened Mar 16, 2026 by
mgoin
Loading…
2 of 7 tasks
[3/n] Migrate cutlass/scaled_mm_entry.cu torch stable ABI
ci/build
nvidia
#37221
opened Mar 16, 2026 by
mikaylagawarecki
Loading…
5 tasks
[Bugfix] Consolidate Gemma2/3 GGUF fixes for correctness on Blackwell
bug
Something isn't working
#37220
opened Mar 16, 2026 by
kitaekatt
Loading…
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.