Skip to content

ci: automate llama.cpp pin updates#13

Merged
leehack merged 2 commits into
mainfrom
chore/llama-cpp-auto-update
May 15, 2026
Merged

ci: automate llama.cpp pin updates#13
leehack merged 2 commits into
mainfrom
chore/llama-cpp-auto-update

Conversation

@leehack
Copy link
Copy Markdown
Owner

@leehack leehack commented May 15, 2026

Summary

  • Adds llama_cpp.version as the single source of truth for the default llama.cpp checkout.
  • Updates CI and publish workflows to read that pin instead of hard-coding b9116.
  • Adds scheduled/manual automation that opens or updates one stable automation/bump-llama-cpp PR with upstream changelog context.
  • Bumps the current pin from b9116 to b9165.

llama.cpp update

Upstream changelog

Release notes for b9165
Details

ci : fix transform of top . entry in release archive (#23080)

  • fix transform of top . entry in release archive

  • simplify

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Commit range

Commits from b9116 to b9165 (first 80)
  • ci : bump ty to 0.0.35 (#22961) (fa62042)
  • vulkan: Check shared memory size for mmq shaders (#22693) (706fbd8)
  • vulkan: Fix Windows performance regression on Intel GPU BF16 workloads for Xe2 and newer (#22461) (ef93e98)
  • examples : add llama-eval (#21152) (fde69a3)
  • model-conversion : add causal-convert-mmproj target [no ci] (#22969) (89730c8)
  • ggml-webgpu: address precision issues for multimodal (#22808) (239a497)
  • ggml-webgpu: Enables running gpt-oss-20b (#22906) (927dada)
  • mtmd, server, common: expose modalities to /v1/models (#22952) (7bfe120)
  • webui: Fix Chat Screen Form box disappearing + autoscroll issues on WebKit (#22977) (dded58b)
  • convert : fix Pixtral 12B --mistral-format conversion (3 bugs) (#22981) (cce09f0)
  • opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (#22755) (a9883db)
  • hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993) (856c3ad)
  • ggml-zendnn : adaptive fallback to CPU backend for small batch sizes (#22681) (61af07c)
  • llama-eval : enable type check (#22988) (bcfe63f)
  • spec : update CLI arguments for better consistency (#22964) (634275f)
  • ci: validate model naming convention (#22680) (3796c94)
  • server, webui: support continue generation on reasoning models (#22727) (5d44db6)
  • download: do not exit() on error (#23008) (e75cd5e)
  • hexagon: add unary tanh op (#22999) (ad96bb8)
  • docs : Update OPENVINO.md (#22959) (7e16646)
  • webui: preserve system message on edit cancel (#22911) (46be24d)
  • webui: Deduplicate model aliases in data + handle single/multiple aliases in UI (#22979) (2dfeca3)
  • flush the gpu profile timestamp before the queryset is overflowed (#22995) (527045b)
  • opencl: fix crash when warming up MoE on Adreno (#22876) (1e4579f)
  • server, webui: accept continue_final_message flag for vLLM API compat (#23012) (95d469a)
  • opencl: add q5_0 and q5_1 MoE for Adreno (#22985) (ec562eb)
  • Fix for issue #22974. Cast intermediate results to float before adding and casting the result to the destination type. Avoids half+half operator ambiguity. (#22994) (7f3f843)
  • ggml-webgpu: only use subgroup-matrix path when head dims are divisible by sg_mat_k / sg_mat_n (#23020) (4c1c3ac)
  • SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations (#21597) (9ed6e19)
  • fix: Autoscroll detection (#23026) (320a6a4)
  • vulkan: fix matmul integer pipeline selection (#23005) (dbe7901)
  • unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110) (42532af)
  • docker : revert stable version of intel compute-runtime (#22968) (0f45f1a)
  • ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (#22863) (81b0d88)
  • logs : reduce (#23021) (67b2b7f)
  • webui: Move static build output from repo code to HF Bucket (#22937) (253ba11)
  • contributing: new contributors should not submit trivial fixes (#23045) (97b658c)
  • fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051) (0c3e4fc)
  • ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040) (5ec717d)
  • ggml-webgpu: Enable NVIDIA self-hosted CI (#22976) (834a243)
  • CI : support IOT device (IQ9) (#22987) (d81e63d)
  • HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880) (3e037f3)
  • ggml-hexagon: cpy: add contiguous fast-path in reshape copy (#23076) (5c0e946)
  • readme : update bindings (#23063) (7155a49)
  • Support for Codex CLI by skipping unsupported Responses tools (#23041) (91e84fe)
  • webui: preserve partial response on streaming error (#23090) (d528444)
  • reasoning-budget: clone should do a deep-copy (#23095) (ac33f03)
  • llama-eval : add AIME 2026 dataset support (#23058) (d5dc2e0)
  • ci : fix transform of top . entry in release archive (#23080) (769cc93)

Test Plan

  • python3 -m py_compile scripts/verify_state_persistence_api.py scripts/verify_ci_reliability.py scripts/state_persistence_browser_smoke.py
  • python3 scripts/verify_state_persistence_api.py
  • python3 scripts/verify_ci_reliability.py
  • git diff --check
  • YAML parse for .github/workflows/*.yml
  • bash -n for workflow run scripts
  • go run github.com/rhysd/actionlint/cmd/actionlint@latest -color=false .github/workflows/ci.yml .github/workflows/publish_assets.yml .github/workflows/auto_llama_cpp_update.yml
  • GitHub Actions Emscripten/WebGPU smoke on PR

Review Notes

  • Fixed post-review blockers: automation PRs now dispatch CI explicitly after bot-token branch updates, and publish release notes receive the resolved llama.cpp tag via a build-job output.
  • Independent pre-push review found no blockers.

Copilot AI review requested due to automatic review settings May 15, 2026 18:46
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces llama_cpp.version as the single source of truth for the pinned llama.cpp tag, refactors CI and publish workflows to read from it, adds a scheduled automation workflow that opens/updates a stable bump PR with upstream changelog, and bumps the pin from b9116 to b9165. The reliability contract and docs (AGENTS.md, README.md, CONTRIBUTING.md) are updated to enforce and describe the new policy.

Changes:

  • Add llama_cpp.version and have ci.yml/publish_assets.yml resolve the tag from it (publish carries it across jobs as a job output).
  • Add .github/workflows/auto_llama_cpp_update.yml to manage one stable automation/bump-llama-cpp PR with release notes/compare/commit-range, dispatch CI on the bot branch, and skip when a non-automation PR already touches the version file.
  • Extend scripts/verify_ci_reliability.py and the docs to require/describe the new pin file, automation workflow, and publish job-output behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
llama_cpp.version New pin file containing b9165.
.github/workflows/ci.yml Adds workflow_dispatch; resolves LLAMA_CPP_TAG from llama_cpp.version instead of hard-coded default.
.github/workflows/publish_assets.yml Makes llama_cpp_tag input optional, resolves default from version file, exposes resolved tag as a job output for the release step.
.github/workflows/auto_llama_cpp_update.yml New scheduled/manual workflow to open or update a stable bump PR and dispatch CI on the automation branch.
scripts/verify_ci_reliability.py Asserts new automation workflow shape, version-file format, removed stale defaults, and updated docs requirements.
AGENTS.md Documents the auto-update workflow, dispatch behavior, and skip-on-conflict policy.
README.md Documents pin file, automation PR, publish job-output behavior, and override semantics.
CONTRIBUTING.md Documents guardrails for the new workflow and override semantics for llama_cpp_tag.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@leehack leehack merged commit 606c0ec into main May 15, 2026
5 checks passed
@leehack leehack deleted the chore/llama-cpp-auto-update branch May 15, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants