Skip to content

chore: bump llama.cpp to b9279#17

Open
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp
Open

chore: bump llama.cpp to b9279#17
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp

Conversation

@github-actions
Copy link
Copy Markdown

@github-actions github-actions Bot commented May 19, 2026

llama.cpp update

Upstream changelog

Release notes for b9279
Details

vulkan: fuse snake activation (mul, sin, sqr, mul, add) (#22855)

  • vulkan: fuse snake activation (mul, sin, sqr, mul, add)

Add snake.comp shader with F32 / F16 / BF16 pipelines and
ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op
decomposition emitted by audio decoders (BigVGAN, Vocos) for snake
activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single
elementwise kernel.

test_snake_fuse from the CUDA PR now also compares CPU naive vs
Vulkan fused across F32 / F16 / BF16.

  • vulkan: address jeffbolznv review for fused snake activation

Rename T / C to ne0 / ne1 in the shader and push constants to match
the standard naming convention used across the Vulkan backend.

Tighten ggml_vk_can_fuse_snake: require x and dst to be contiguous
(the shader uses idx = i0 + i1 * ne0) and require a / inv_b to be
tightly packed on the broadcast dim (the shader reads data_a[i1]).

  • vulkan: tighten snake fusion type checks for all operands (address jeffbolznv review)

  • vulkan: reject snake fusion when ne[2] or ne[3] > 1 (address jeffbolznv review)

  • vulkan: address 0cc4m review for fused snake activation

snake.comp is renamed to follow the ggml DATA_A_* / A_TYPE convention.
A_TYPE now applies to the activation tensor data_a instead of the
broadcast multiplier, and the bindings become data_a (A_TYPE), data_b
(float), data_c (float) and data_d (D_TYPE). A header at the top of
the shader maps each buffer to its role in y = x + sin(b * x)^2 * c.

On the C++ side, ggml_vk_can_fuse_snake reuses the existing snake_pattern
constant instead of duplicating the op list, sin_node is extracted as a
named local alongside the other chain nodes, and the broadcast operands
a and inv_b are now required to be GGML_TYPE_F32 to match the hardcoded
float bindings on data_b and data_c (the previous a->type == x->type
would silently reject any future BF16 or F16 chain once the supports_op
gate for SIN / SQR is lifted). ggml_vk_snake_dispatch_fused gets an
explicit GGML_TYPE_F32 case and GGML_ABORT on default in place of the
silent f32 fallback, and a stale comment about data_a[i1] / data_inv_b[i1]
is refreshed to match the new binding names.

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Commit range

Commits from b9165 to b9279 (first 80)
  • Refactor: convert_hf_to_gguf.py (#17114) (cc7200b)
  • convert : fix Qwen3 ASR conversion (#23081) (18d1717)
  • webui: fix theme from --webui-config-file not applied on first load (fresh localStorage) (#22902) (8be1786)
  • mtmd: add chunks and fix preproc for qwen3a (#23073) (72e60f5)
  • docs: document usage object in server timings response (#23110) (6831fe4)
  • tests: add BF16 non-contig coverage for MUL_MAT permutations (#22689) (cfabeb1)
  • webui: Use lowercase hash for HF checksum check (#23107) (1348f67)
  • ci : fix release symlinks (#23119) (49d1701)
  • ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming (#23064) (59778f0)
  • model : NvFP4 quantized LM head support (#23046) (42928bc)
  • fix: Add build step using build workflow to publish workflow (#23134) (1d9f99a)
  • ui: untrack settings sync in props effect to prevent reactive loop (#23127) (366c5e2)
  • webui : [ChatFormActionAdd][a11y] fix accessibility issues in add menu trigger and items (#22736) (1428004)
  • ui: Fix handling of MCP resource template parameters (#23117) (b81c2cd)
  • llama + spec: MTP Support (#22673) (2555826)
  • vendor : update cpp-httplib to 0.45.0 (#23103) (18675b6)
  • ui: Correct links in tools/ui/README.md [no ci] (#23139) (25b1bc9)
  • ggml: install ggml.pc in /pkgconfig (ggml/1480) (2eb3e6b)
  • metal : tighten input-position loop in kernel_conv_transpose_1d (ggml/1477) (560445b)
  • ggml : bump version to 0.12.0 (ggml/1494) (e6c37a1)
  • sync : ggml (3a92bc9)
  • ui: Add request timeout for MCP tool calls (#23138) (0253fb2)
  • vulkan: removed duplicate #include in headers (#23144) (6049906)
  • server: skip device enumeration in router mode to avoid creating CUDA primary context (#23137) (64b38b5)
  • server: (router) alloc tmp buffer on heap (#23159) (b64739e)
  • webui: support video files as input (#22830) (4f13cb7)
  • ngram : reduce noisy logs (#23185) (a16cce8)
  • server : honor --embd-normalize CLI arg (#23125) (1a68ec9)
  • vulkan: fuse SSM_CONV + BIAS + SILU (#22653) (3fbadb0)
  • common : enable streaming JSON argument values (#23173) (f4cc787)
  • vulkan: Support unaligned tensors for ROPE (#22637) (7ba22c6)
  • vulkan: add cpy bf16 -> f32 pipelines (#22677) (fcae601)
  • ggml-vulkan/CMakeLists: add a check for SPIRV-Headers (#22009) (a6d6183)
  • common : delegate assistant continuation to underlying template handlers (#23089) (39cf5d6)
  • llama: avoid copying logits during prompt decode in MTP (#23198) (3e12fbd)
  • CUDA: Continue directly including cuda/iterator (#23102) (84c6782)
  • cmake : do not install conversion script (#23204) (e0de4c2)
  • cmake : fix LLAMA_BUILD_UI logic (#23190) (8758904)
  • feat: Support d_conv=15 for ssm-conv.cu (#23017) (726704a)
  • cmake : do not check for bin install dir (#23234) (dd7cad7)
  • update bid to match each layers MTP source (#23237) (1867a0c)
  • sycl : fix error when use -mg 1 error (#23140) (e98bcfe)
  • sycl: route small f32 matmuls to oneMKL, bypass oneDNN (#22150) (5511965)
  • sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (#22156) (0caf2a1)
  • scripts : allow wc2wt with an existing branch (#23189) (c3f95c1)
  • ci : added kleidiai-server to server-self-hosted workflow (#22435) (053e01d)
  • add myself to conversion (#23261) (77e38d6)
  • llama: initialize pre-norm embedding mask flag (#23256) (49c21f9)
  • webui: fix Tailwind v4 utility classes missing when built via cmake (#23253) (232f466)
  • ui: Centralize monospace font styles in app.css (#23272) (a135ec0)
  • ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (#23236) (1ff0fc1)
  • feat: add scroll-to-bottom button to chat + prevent forced scroll down (#23270) (b9a2170)
  • ui: Update KaTeX package and clean up logs from sass warnings (#23275) (3a9c1b8)
  • common : remove hf cache migration (#23266) (45b455e)
  • docker : add OCI image labels for version and build date (#21653) (5cbaa5e)
  • ggml-hexagon: add PAD op HVX kernel (#23078) (b734044)
  • hexagon: add support for TRI op (#22822) (9a532ae)
  • rpc : keep last_graph_uid in the device context (#23273) (c3e9ade)
  • sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (#22153) (439f1b1)
  • convert : filter lora tensor names (#23077) (f1c1c5c)
  • [SCYL] add chapter for performance reference in SYCL.md (#23315) (aabee04)
  • ggml-webgpu : extend GDN for K>1 (#23299) (c85a242)
  • llama-eval : add per-task summary stats (#23151) (d2e179a)
  • save-load-state : refactor tests and improve readability (#23196) (cd963fe)
  • server : print graphs reused in slot timings (#23279) (3c81c8d)
  • server-context: guarantee there is at least 1 token to decode (#23280) (ccee426)
  • ci : install server kleidiai runner dependencies (#23259) (00c461c)
  • ci : install libssl-dev (#23325) (4b262ab)
  • ui: Bump packages + address build warnings (#23300) (6db1304)
  • llama : MTP clean-up (#23269) (d14ce3d)
  • model : clarify MTP layer comment in qwen35.cpp [no ci] (#23338) (baf3cc6)
  • hexagon: enable support for NORM op (#23319) (ac76808)
  • convert : update mtp related help (#23334) (b7393a4)
  • common: fix --fit verbosity with --verbosity 4 (#23282) (7256fce)
  • common: fix --help for --verbosity (#23278) (57cb35c)
  • github: mention --log-file in issue templates (#23277) (a807867)
  • refactor: Chat Screen UI rendering (#23333) (67ace02)
  • hexagon: add MROPE and IMROPE support in HTP rope op (#23317) (17d22a3)
  • opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (#23303) (b28a2f3)
  • ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (#23349) (b39a7bf)

Web bridge review focus

Please pay extra attention to upstream changes touching:

  • WebGPU, WASM, Emscripten, pthreads, or memory64 build behavior
  • ggml backend APIs used by the bridge
  • model loading, tokenizer, chat template, context/state persistence, or cache semantics
  • CMake/build flags that can affect the generated JS/WASM artifacts

Validation

  • Emscripten build passed
  • Browser WebGPU/state-persistence smoke passed
  • Generated bridge artifacts include wasm32 and memory64 outputs
  • No stale hard-coded llama.cpp tag remains in CI/publish defaults

Automation behavior

This PR is managed from the stable branch automation/bump-llama-cpp. If another llama.cpp release appears before merge, the scheduled workflow updates this same PR instead of opening a duplicate. The workflow skips if a non-automation PR already changes llama_cpp.version.

@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from c374d7d to b0e1e3f Compare May 19, 2026 13:32
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from b0e1e3f to dcacf23 Compare May 20, 2026 12:39
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9222 chore: bump llama.cpp to b9247 May 20, 2026
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9247 chore: bump llama.cpp to b9264 May 21, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from dcacf23 to d82afc2 Compare May 21, 2026 13:32
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9264 chore: bump llama.cpp to b9279 May 22, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from d82afc2 to 74a6dbd Compare May 22, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant