chore(engine): bump bundled llama.cpp sidecar to b9781#253
Merged
Conversation
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Bumps the bundled llama.cpp
llama-serversidecar fromb9590to the latest releaseb9781, and consolidates the engine-packaging documentation into a single source of truth. This is the manual interim bump while #238 (automated bump) is still open.What changed
scripts/ensure-llama-server.ts):LLAMA_CPP_TAGb9590→b9781, andASSET_SHA256set to the new macOS arm64 asset's hash (read from the GitHub releasedigest).nightly-release,pr-backend-tests,pr-build-validation,release-please): bumped to…-b9781-50e822733750dbc3. Cache keys are immutable, so embedding the pin makes the new asset cache fresh instead of being restored stale from the old key.docs/models-and-providers.md→ "How the engine binary is packaged" is expanded into the canonical explanation of the pin, the fetch script and its five steps, when it runs (no-op via the stamp file unless the pin changes), and the dev-vs-.appfile layout.docs/release-process.mdis trimmed to release-specific facts and cross-links to it, removing the duplicated conceptual prose.src-tauri/src/openai.rs): reworded the reasoning-kwargs note so it reads as a historical verification datapoint rather than a claim about the current pin.How it works
The pin is a release tag plus the asset SHA-256.
engine:ensurefetches that exact asset, verifies the hash, re-derives the dylib link closure, and ad-hoc re-signs. The closure is unchanged for b9781 (10 dylibs, still matchesbundle.macOS.frameworks), so notauri.conf.jsonchange was needed.Testing
Verified on Apple Silicon against the actual b9781 binary:
engine:ensurefetches, hash-verifies, and installs cleanly; dylib closure matches the frameworks list.--help(-m --mmproj --ctx-size --host --port --no-webui --parallel).codesign -vvclean on the binary and all 10 dylibs./health+ a non-stream completion + SSE streaming (with reasoning deltas) against gpt-oss-20b.validate-buildpasses; the bundle re-signs cleanly.Note: the vision/
--mmprojruntime path is unexercised (no vision model installed locally), though the flag is present in the new binary. Reasoning suppression for template-switch families (e.g. Qwen3.5) was verified on b9590 and not re-confirmed on b9781 for lack of a local Qwen model; the kwargs are accepted with no error on b9781.