ci: automate llama.cpp pin updates by leehack · Pull Request #13 · leehack/llama-web-bridge

leehack · 2026-05-15T15:46:06Z

Summary

Adds llama_cpp.version as the single source of truth for the default llama.cpp checkout.
Updates CI and publish workflows to read that pin instead of hard-coding b9116.
Adds scheduled/manual automation that opens or updates one stable automation/bump-llama-cpp PR with upstream changelog context.
Bumps the current pin from b9116 to b9165.

llama.cpp update

Previous pin: b9116
New pin: b9165
Upstream release: https://github.com/ggml-org/llama.cpp/releases/tag/b9165
Compare: ggml-org/llama.cpp@b9116...b9165

Upstream changelog

Release notes for b9165

Details

ci : fix transform of top . entry in release archive (#23080)

fix transform of top . entry in release archive
simplify

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Commit range

Commits from b9116 to b9165 (first 80)

ci : bump ty to 0.0.35 (#22961) (fa62042)
vulkan: Check shared memory size for mmq shaders (#22693) (706fbd8)
vulkan: Fix Windows performance regression on Intel GPU BF16 workloads for Xe2 and newer (#22461) (ef93e98)
examples : add llama-eval (#21152) (fde69a3)
model-conversion : add causal-convert-mmproj target [no ci] (#22969) (89730c8)
ggml-webgpu: address precision issues for multimodal (#22808) (239a497)
ggml-webgpu: Enables running gpt-oss-20b (#22906) (927dada)
mtmd, server, common: expose modalities to /v1/models (#22952) (7bfe120)
webui: Fix Chat Screen Form box disappearing + autoscroll issues on WebKit (#22977) (dded58b)
convert : fix Pixtral 12B --mistral-format conversion (3 bugs) (#22981) (cce09f0)
opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (#22755) (a9883db)
hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993) (856c3ad)
ggml-zendnn : adaptive fallback to CPU backend for small batch sizes (#22681) (61af07c)
llama-eval : enable type check (#22988) (bcfe63f)
spec : update CLI arguments for better consistency (#22964) (634275f)
ci: validate model naming convention (#22680) (3796c94)
server, webui: support continue generation on reasoning models (#22727) (5d44db6)
download: do not exit() on error (#23008) (e75cd5e)
hexagon: add unary tanh op (#22999) (ad96bb8)
docs : Update OPENVINO.md (#22959) (7e16646)
webui: preserve system message on edit cancel (#22911) (46be24d)
webui: Deduplicate model aliases in data + handle single/multiple aliases in UI (#22979) (2dfeca3)
flush the gpu profile timestamp before the queryset is overflowed (#22995) (527045b)
opencl: fix crash when warming up MoE on Adreno (#22876) (1e4579f)
server, webui: accept continue_final_message flag for vLLM API compat (#23012) (95d469a)
opencl: add q5_0 and q5_1 MoE for Adreno (#22985) (ec562eb)
Fix for issue #22974. Cast intermediate results to float before adding and casting the result to the destination type. Avoids half+half operator ambiguity. (#22994) (7f3f843)
ggml-webgpu: only use subgroup-matrix path when head dims are divisible by sg_mat_k / sg_mat_n (#23020) (4c1c3ac)
SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations (#21597) (9ed6e19)
fix: Autoscroll detection (#23026) (320a6a4)
vulkan: fix matmul integer pipeline selection (#23005) (dbe7901)
unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110) (42532af)
docker : revert stable version of intel compute-runtime (#22968) (0f45f1a)
ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (#22863) (81b0d88)
logs : reduce (#23021) (67b2b7f)
webui: Move static build output from repo code to HF Bucket (#22937) (253ba11)
contributing: new contributors should not submit trivial fixes (#23045) (97b658c)
fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051) (0c3e4fc)
ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040) (5ec717d)
ggml-webgpu: Enable NVIDIA self-hosted CI (#22976) (834a243)
CI : support IOT device (IQ9) (#22987) (d81e63d)
HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880) (3e037f3)
ggml-hexagon: cpy: add contiguous fast-path in reshape copy (#23076) (5c0e946)
readme : update bindings (#23063) (7155a49)
Support for Codex CLI by skipping unsupported Responses tools (#23041) (91e84fe)
webui: preserve partial response on streaming error (#23090) (d528444)
reasoning-budget: clone should do a deep-copy (#23095) (ac33f03)
llama-eval : add AIME 2026 dataset support (#23058) (d5dc2e0)
ci : fix transform of top . entry in release archive (#23080) (769cc93)

Test Plan

python3 -m py_compile scripts/verify_state_persistence_api.py scripts/verify_ci_reliability.py scripts/state_persistence_browser_smoke.py
python3 scripts/verify_state_persistence_api.py
python3 scripts/verify_ci_reliability.py
git diff --check
YAML parse for .github/workflows/*.yml
bash -n for workflow run scripts
go run github.com/rhysd/actionlint/cmd/actionlint@latest -color=false .github/workflows/ci.yml .github/workflows/publish_assets.yml .github/workflows/auto_llama_cpp_update.yml
GitHub Actions Emscripten/WebGPU smoke on PR

Review Notes

Fixed post-review blockers: automation PRs now dispatch CI explicitly after bot-token branch updates, and publish release notes receive the resolved llama.cpp tag via a build-job output.
Independent pre-push review found no blockers.

Copilot

Pull request overview

Introduces llama_cpp.version as the single source of truth for the pinned llama.cpp tag, refactors CI and publish workflows to read from it, adds a scheduled automation workflow that opens/updates a stable bump PR with upstream changelog, and bumps the pin from b9116 to b9165. The reliability contract and docs (AGENTS.md, README.md, CONTRIBUTING.md) are updated to enforce and describe the new policy.

Changes:

Add llama_cpp.version and have ci.yml/publish_assets.yml resolve the tag from it (publish carries it across jobs as a job output).
Add .github/workflows/auto_llama_cpp_update.yml to manage one stable automation/bump-llama-cpp PR with release notes/compare/commit-range, dispatch CI on the bot branch, and skip when a non-automation PR already touches the version file.
Extend scripts/verify_ci_reliability.py and the docs to require/describe the new pin file, automation workflow, and publish job-output behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
llama_cpp.version	New pin file containing `b9165`.
.github/workflows/ci.yml	Adds `workflow_dispatch`; resolves `LLAMA_CPP_TAG` from `llama_cpp.version` instead of hard-coded default.
.github/workflows/publish_assets.yml	Makes `llama_cpp_tag` input optional, resolves default from version file, exposes resolved tag as a job output for the release step.
.github/workflows/auto_llama_cpp_update.yml	New scheduled/manual workflow to open or update a stable bump PR and dispatch CI on the automation branch.
scripts/verify_ci_reliability.py	Asserts new automation workflow shape, version-file format, removed stale defaults, and updated docs requirements.
AGENTS.md	Documents the auto-update workflow, dispatch behavior, and skip-on-conflict policy.
README.md	Documents pin file, automation PR, publish job-output behavior, and override semantics.
CONTRIBUTING.md	Documents guardrails for the new workflow and override semantics for `llama_cpp_tag`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

leehack added 2 commits May 15, 2026 09:52

ci: automate llama.cpp pin updates

d2c56d2

ci: harden llama.cpp automation workflows

cc88a54

Copilot AI review requested due to automatic review settings May 15, 2026 18:46

Copilot started reviewing on behalf of leehack May 15, 2026 18:47 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

leehack merged commit 606c0ec into main May 15, 2026
5 checks passed

leehack deleted the chore/llama-cpp-auto-update branch May 15, 2026 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: automate llama.cpp pin updates#13

ci: automate llama.cpp pin updates#13
leehack merged 2 commits into
mainfrom
chore/llama-cpp-auto-update

leehack commented May 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leehack commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

llama.cpp update

Upstream changelog

Commit range

Test Plan

Review Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leehack commented May 15, 2026 •

edited

Loading