Bump sglang from 0.5.3.post3 to 0.5.10rc0 by dependabot[bot] · Pull Request #7 · eBay/spec_dec

dependabot · 2026-05-08T18:56:39Z

Bumps sglang from 0.5.3.post3 to 0.5.10rc0.

Release notes

v0.5.10rc0

Highlights

Piecewise CUDA Graph Enabled by Default: Piecewise CUDA graph capture is now the default execution mode, reducing memory overhead and improving throughput for models with complex control flow patterns: #16331

Elastic EP for Partial Failure Tolerance: Integrate Elastic NIXL-EP into SGLang, enabling partial failure tolerance for DeepSeek MoE deployments — when a GPU fails, the system redistributes expert weights and continues serving without full restart: #19248, #17374, #12068 blog

HiSparse for Sparse Attention: Integrate HiSparse sparse attention backend for efficient long-context inference with reduced compute through sparsity-aware attention: #20343

SGLang-Diffusion Update:

Model support: LTX-2, Hunyuan3D-2, Helios

Performance improvements on Qwen-image, Z-image increased by 1.5x

New platform: macOS

New feature: enhance the performance of diffusers backend by integrating all optimization from Cache-DiT

SKILLs: feel free to explore the curated skill for developing and optimizing sglang-diffusion!

FlashInfer MXFP8 Kernel Support: Integrate FlashInfer mxfp8 kernels for GEMM and MoE operations, enabling mixed-precision FP8 inference with higher accuracy through microscaling for RL and general workloads: #19537

Transformers 5.3.0 Upgrade: Major upgrade from transformers 4.57.1 to 5.3.0, unlocking support for the latest model architectures and features from HuggingFace. GLM-5 model is now supported in this image instead of the custom built image: #17784

DeepSeek V3.2 / GLM-5 Optimization: GLM-5 runnable on main branch (with upgraded transformers). Fused Triton kernel for prefill KV cache fetching, NSA fuse store indexer for K cache, and configurable KV length threshold for sparse MLA attention at prefill — boosting throughput for long-context DeepSeek V3.2 and GLM-5 serving: #19319, #19148, #20062

Qwen3.5 GDN/KDA Optimization: Transpose linear attention state layout from [N, HV, K, V] to [N, HV, V, K] and fuse split/reshape/cat ops in GDN projection with Triton kernel, plus CuTeDSL KDA decode kernel support for improved Qwen3.5 performance: #20283, #21019, #21203

LoRA Support for MoE Layers: Add LoRA fine-tuning support for Mixture-of-Experts layers with JIT alignment kernels, fused Triton kernels, TP support, and auto-detection of LoRA target modules — enabling efficient adapter-based tuning on MoE models like DeepSeek: #19710, #19711, #14105, #21439

Prefill Context Parallel for MHA (Qwen3): Enable context parallelism during prefill for multi-head attention models like Qwen3 MoE, distributing long sequences across GPUs to reduce per-GPU memory and accelerate prefill: #18233

Flash Attention 4 Official Library Support: Upgrade to the official FlashAttention 4 package, bringing the latest attention optimizations and Blackwell GPU support: #20303

sglang-kernel 0.4.0: Major kernel package release with renamed package (sgl-kernel → sglang-kernel), consolidated kernels, and cleanup of deprecated ops: #20440

Native MLX Backend for Apple Silicon: Add native MLX execution backend enabling SGLang to run inference directly on Apple Silicon Macs without CUDA: #20342

New Model Support

Nemotron-3-Super (bf16/fp8/nvfp4): #20407, cookbook

Mistral Small 4 (Pixtral): #20708

GLM-5: Supported on main branch with transformers 5.3.0

Helios (Diffusion - Real-Time Long Video Generation): #19782

Hunyuan3D-2 (Diffusion): #18170

LTX-2 (Diffusion): #19295

MOVA (Diffusion): #19489, #20430

FireRed-Image-Edit (Diffusion): #20862

DeepSeek V3.2 / GLM-5 Optimization

Fused get_k_and_s Triton kernel for prefill KV cache fetching: #19319

Support NSA fuse store indexer K cache: #19148

SGLANG_NSA_DENSE_ATTN_KV_LEN_THRESHOLD environ for controlling KV length threshold of applying sparse MLA attention kernel at prefill: #20062

Change default setting of V3.2 nvfp4 on TP4: #20086

Fix NSA topk_indices_offset when prefill flashmla_sparse is used with FP8 KV cache: #20606

... (truncated)

Commits

9b0c470 fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-39...
192ba2c [Security] 1/N: Bind ZMQ sockets to localhost to prevent unauthenticated remo...
e3a4a09 [CI] Fix nemotron nvfp4 test estimated time (#21516)
46e3811 Remove redundant DeepSeek V3 FP4 PCG test (#21485)
7738d2b Revert "bugfix for weight loading for qwen3-next" (#21496)
2e65c27 Api add flush cache timeout (#21413)
8c3ccef Fix Kimi K2.5 dp attention+ spec decoding launch crash (#21391)
be0cca5 Use torch.addmm instead of separate mm and add_ calls for LoRA torch.native (...
e59ea4f fix: torch-native LoRA for multi-adapter case (#20564)
fb90c9d [Test] Consolidate eval accuracy test mixins into eval_accuracy_kit (#21047)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [sglang](https://github.com/sgl-project/sglang) from 0.5.3.post3 to 0.5.10rc0. - [Release notes](https://github.com/sgl-project/sglang/releases) - [Commits](sgl-project/sglang@v0.5.3.post3...v0.5.10rc0) --- updated-dependencies: - dependency-name: sglang dependency-version: 0.5.10rc0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump sglang from 0.5.3.post3 to 0.5.10rc0#7

Bump sglang from 0.5.3.post3 to 0.5.10rc0#7
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/sglang-0.5.10rc0

dependabot Bot commented on behalf of github May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github May 8, 2026

v0.5.10rc0

Highlights

New Model Support

DeepSeek V3.2 / GLM-5 Optimization

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants