chore: bump llama.cpp to b9279#17
Open
github-actions[bot] wants to merge 1 commit into
Open
Conversation
c374d7d to
b0e1e3f
Compare
b0e1e3f to
dcacf23
Compare
dcacf23 to
d82afc2
Compare
d82afc2 to
74a6dbd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
llama.cpp update
b9165b9279Upstream changelog
Release notes for b9279
Details
vulkan: fuse snake activation (mul, sin, sqr, mul, add) (#22855)
Add snake.comp shader with F32 / F16 / BF16 pipelines and
ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op
decomposition emitted by audio decoders (BigVGAN, Vocos) for snake
activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single
elementwise kernel.
test_snake_fuse from the CUDA PR now also compares CPU naive vs
Vulkan fused across F32 / F16 / BF16.
Rename T / C to ne0 / ne1 in the shader and push constants to match
the standard naming convention used across the Vulkan backend.
Tighten ggml_vk_can_fuse_snake: require x and dst to be contiguous
(the shader uses idx = i0 + i1 * ne0) and require a / inv_b to be
tightly packed on the broadcast dim (the shader reads data_a[i1]).
vulkan: tighten snake fusion type checks for all operands (address jeffbolznv review)
vulkan: reject snake fusion when ne[2] or ne[3] > 1 (address jeffbolznv review)
vulkan: address 0cc4m review for fused snake activation
snake.comp is renamed to follow the ggml DATA_A_* / A_TYPE convention.
A_TYPE now applies to the activation tensor data_a instead of the
broadcast multiplier, and the bindings become data_a (A_TYPE), data_b
(float), data_c (float) and data_d (D_TYPE). A header at the top of
the shader maps each buffer to its role in y = x + sin(b * x)^2 * c.
On the C++ side, ggml_vk_can_fuse_snake reuses the existing snake_pattern
constant instead of duplicating the op list, sin_node is extracted as a
named local alongside the other chain nodes, and the broadcast operands
a and inv_b are now required to be GGML_TYPE_F32 to match the hardcoded
float bindings on data_b and data_c (the previous a->type == x->type
would silently reject any future BF16 or F16 chain once the supports_op
gate for SIN / SQR is lifted). ggml_vk_snake_dispatch_fused gets an
explicit GGML_TYPE_F32 case and GGML_ABORT on default in place of the
silent f32 fallback, and a stale comment about data_a[i1] / data_inv_b[i1]
is refreshed to match the new binding names.
macOS/iOS:
Linux:
Android:
Windows:
openEuler:
Commit range
Commits from b9165 to b9279 (first 80)
cc7200b)18d1717)8be1786)72e60f5)usageobject in server timings response (#23110) (6831fe4)cfabeb1)1348f67)49d1701)tools/uifolder andui/UI/llama-ui/LLAMA_UInaming (#23064) (59778f0)42928bc)1d9f99a)366c5e2)1428004)b81c2cd)2555826)18675b6)tools/ui/README.md[no ci] (#23139) (25b1bc9)2eb3e6b)560445b)e6c37a1)3a92bc9)0253fb2)6049906)64b38b5)b64739e)4f13cb7)a16cce8)1a68ec9)3fbadb0)f4cc787)7ba22c6)fcae601)a6d6183)39cf5d6)3e12fbd)84c6782)e0de4c2)8758904)726704a)dd7cad7)1867a0c)e98bcfe)5511965)0caf2a1)c3f95c1)053e01d)77e38d6)49c21f9)232f466)a135ec0)1ff0fc1)b9a2170)sasswarnings (#23275) (3a9c1b8)45b455e)5cbaa5e)b734044)9a532ae)c3e9ade)439f1b1)f1c1c5c)aabee04)c85a242)d2e179a)cd963fe)3c81c8d)ccee426)00c461c)4b262ab)6db1304)d14ce3d)baf3cc6)ac76808)b7393a4)7256fce)57cb35c)a807867)67ace02)17d22a3)b28a2f3)b39a7bf)Web bridge review focus
Please pay extra attention to upstream changes touching:
Validation
Automation behavior
This PR is managed from the stable branch
automation/bump-llama-cpp. If another llama.cpp release appears before merge, the scheduled workflow updates this same PR instead of opening a duplicate. The workflow skips if a non-automation PR already changesllama_cpp.version.