Skip to content

Bump sglang from 0.5.3.post3 to 0.5.10rc0#7

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/sglang-0.5.10rc0
Open

Bump sglang from 0.5.3.post3 to 0.5.10rc0#7
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/sglang-0.5.10rc0

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot Bot commented on behalf of github May 8, 2026

Bumps sglang from 0.5.3.post3 to 0.5.10rc0.

Release notes

Sourced from sglang's releases.

v0.5.10rc0

Highlights

  • Piecewise CUDA Graph Enabled by Default: Piecewise CUDA graph capture is now the default execution mode, reducing memory overhead and improving throughput for models with complex control flow patterns: #16331

  • Elastic EP for Partial Failure Tolerance: Integrate Elastic NIXL-EP into SGLang, enabling partial failure tolerance for DeepSeek MoE deployments — when a GPU fails, the system redistributes expert weights and continues serving without full restart: #19248, #17374, #12068 blog

  • HiSparse for Sparse Attention: Integrate HiSparse sparse attention backend for efficient long-context inference with reduced compute through sparsity-aware attention: #20343

  • SGLang-Diffusion Update:

    • Model support: LTX-2, Hunyuan3D-2, Helios
    • Performance improvements on Qwen-image, Z-image increased by 1.5x
    • New platform: macOS
    • New feature: enhance the performance of diffusers backend by integrating all optimization from Cache-DiT
    • SKILLs: feel free to explore the curated skill for developing and optimizing sglang-diffusion!
  • FlashInfer MXFP8 Kernel Support: Integrate FlashInfer mxfp8 kernels for GEMM and MoE operations, enabling mixed-precision FP8 inference with higher accuracy through microscaling for RL and general workloads: #19537

  • Transformers 5.3.0 Upgrade: Major upgrade from transformers 4.57.1 to 5.3.0, unlocking support for the latest model architectures and features from HuggingFace. GLM-5 model is now supported in this image instead of the custom built image: #17784

  • DeepSeek V3.2 / GLM-5 Optimization: GLM-5 runnable on main branch (with upgraded transformers). Fused Triton kernel for prefill KV cache fetching, NSA fuse store indexer for K cache, and configurable KV length threshold for sparse MLA attention at prefill — boosting throughput for long-context DeepSeek V3.2 and GLM-5 serving: #19319, #19148, #20062

  • Qwen3.5 GDN/KDA Optimization: Transpose linear attention state layout from [N, HV, K, V] to [N, HV, V, K] and fuse split/reshape/cat ops in GDN projection with Triton kernel, plus CuTeDSL KDA decode kernel support for improved Qwen3.5 performance: #20283, #21019, #21203

  • LoRA Support for MoE Layers: Add LoRA fine-tuning support for Mixture-of-Experts layers with JIT alignment kernels, fused Triton kernels, TP support, and auto-detection of LoRA target modules — enabling efficient adapter-based tuning on MoE models like DeepSeek: #19710, #19711, #14105, #21439

  • Prefill Context Parallel for MHA (Qwen3): Enable context parallelism during prefill for multi-head attention models like Qwen3 MoE, distributing long sequences across GPUs to reduce per-GPU memory and accelerate prefill: #18233

  • Flash Attention 4 Official Library Support: Upgrade to the official FlashAttention 4 package, bringing the latest attention optimizations and Blackwell GPU support: #20303

  • sglang-kernel 0.4.0: Major kernel package release with renamed package (sgl-kernel → sglang-kernel), consolidated kernels, and cleanup of deprecated ops: #20440

  • Native MLX Backend for Apple Silicon: Add native MLX execution backend enabling SGLang to run inference directly on Apple Silicon Macs without CUDA: #20342

New Model Support

  • Nemotron-3-Super (bf16/fp8/nvfp4): #20407, cookbook
  • Mistral Small 4 (Pixtral): #20708
  • GLM-5: Supported on main branch with transformers 5.3.0
  • Helios (Diffusion - Real-Time Long Video Generation): #19782
  • Hunyuan3D-2 (Diffusion): #18170
  • LTX-2 (Diffusion): #19295
  • MOVA (Diffusion): #19489, #20430
  • FireRed-Image-Edit (Diffusion): #20862

DeepSeek V3.2 / GLM-5 Optimization

  • Fused get_k_and_s Triton kernel for prefill KV cache fetching: #19319
  • Support NSA fuse store indexer K cache: #19148
  • SGLANG_NSA_DENSE_ATTN_KV_LEN_THRESHOLD environ for controlling KV length threshold of applying sparse MLA attention kernel at prefill: #20062
  • Change default setting of V3.2 nvfp4 on TP4: #20086
  • Fix NSA topk_indices_offset when prefill flashmla_sparse is used with FP8 KV cache: #20606

... (truncated)

Commits
  • 9b0c470 fix(security): replace unsafe pickle.loads with SafeUnpickler for CVE-2026-39...
  • 192ba2c [Security] 1/N: Bind ZMQ sockets to localhost to prevent unauthenticated remo...
  • e3a4a09 [CI] Fix nemotron nvfp4 test estimated time (#21516)
  • 46e3811 Remove redundant DeepSeek V3 FP4 PCG test (#21485)
  • 7738d2b Revert "bugfix for weight loading for qwen3-next" (#21496)
  • 2e65c27 Api add flush cache timeout (#21413)
  • 8c3ccef Fix Kimi K2.5 dp attention+ spec decoding launch crash (#21391)
  • be0cca5 Use torch.addmm instead of separate mm and add_ calls for LoRA torch.native (...
  • e59ea4f fix: torch-native LoRA for multi-adapter case (#20564)
  • fb90c9d [Test] Consolidate eval accuracy test mixins into eval_accuracy_kit (#21047)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [sglang](https://github.com/sgl-project/sglang) from 0.5.3.post3 to 0.5.10rc0.
- [Release notes](https://github.com/sgl-project/sglang/releases)
- [Commits](sgl-project/sglang@v0.5.3.post3...v0.5.10rc0)

---
updated-dependencies:
- dependency-name: sglang
  dependency-version: 0.5.10rc0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants