Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Mention ANV_SYS_MEM_LIMIT in Vulkan/Linux section of build.md documentation Improvements or additions to documentation
#20670 opened Mar 17, 2026 by adapt-L Loading…
ggml-webgpu: Update the RMS_NORM preprocessor and add L2_NORM documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning WebGPU
#20665 opened Mar 17, 2026 by yomaytk Loading…
ggml-webgpu: Add supports for DIAG and TRI documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning WebGPU
#20664 opened Mar 17, 2026 by yomaytk Loading…
vulkan: change gated_delta_net to shard a column across a subgroup ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#20662 opened Mar 17, 2026 by jeffbolznv Loading…
Fix chat parser regressions: inference crashes/frozen; output backtracked testing Everything test related
#20660 opened Mar 17, 2026 by jpohhhh Loading…
vulkan: dequantize iq4_xs 4 at a time ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#20657 opened Mar 16, 2026 by netrunnereve Loading…
Make debug build possible on Windows with HIP backend. ggml changes relating to the ggml tensor library for machine learning
#20655 opened Mar 16, 2026 by Exile333 Loading…
tests: set dist_sampling seq_id to 0 testing Everything test related
#20645 opened Mar 16, 2026 by taronaeo Loading…
ggml-cuda: Add NVFP4 dp4a kernel ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes
#20644 opened Mar 16, 2026 by michaelw9999 Loading…
[CUDA] Use a single warp per element instead of a single block per element if the K-dimension is small ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#20635 opened Mar 16, 2026 by gaugarg-nv Loading…
ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot ggml changes relating to the ggml tensor library for machine learning
#20633 opened Mar 16, 2026 by rehan-10xengineer Loading…
model : add QKV weight fusion for LLaMA, Qwen2, and Qwen3 model Model specific python python script changes script Script related
#20628 opened Mar 16, 2026 by JoursBleu Draft
ggml-cpu: simd_gemm implementation for riscv vector extension ggml changes relating to the ggml tensor library for machine learning
#20627 opened Mar 16, 2026 by rehan-10xengineer Loading…
[OpenVINO backend] add func is_splited_model() ggml changes relating to the ggml tensor library for machine learning
#20626 opened Mar 16, 2026 by zhaixuejun1993 Loading…
MiroThinker tool call parser
#20624 opened Mar 16, 2026 by hksdpc255 Loading…
Replace UINT64_MAX with LONG_TIMEOUT in blocking case to isolate possibility of Dawn/llvm-pipe bug ggml changes relating to the ggml tensor library for machine learning
#20621 opened Mar 16, 2026 by nikhilJain17 Draft
kleidiai : fix MUL_MAT support for batched (3D) inputs ggml changes relating to the ggml tensor library for machine learning
#20620 opened Mar 16, 2026 by jabr Loading…
ggml webgpu: Move to no timeout for WaitAny in graph submission to avoid deadlocks ggml changes relating to the ggml tensor library for machine learning
#20618 opened Mar 16, 2026 by reeselevine Loading…
devops: add an intel mkl container devops improvements to build systems and github actions
#20616 opened Mar 16, 2026 by kannon92 Loading…
ggml: MXFP flash attention with SoA layout (CPU scalar reference) examples ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related
#20609 opened Mar 15, 2026 by timothyeburke Draft
ggml blas: set mkl threads from thread context devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning
#20602 opened Mar 15, 2026 by kannon92 Loading…
vulkan: allow graphics queue only through env var ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
#20599 opened Mar 15, 2026 by 0cc4m Loading…
ProTip! no:milestone will show everything without a milestone.