chore(engine): bump bundled llama.cpp sidecar to b9781 by quiet-node · Pull Request #253 · quiet-node/thuki

quiet-node · 2026-06-24T17:33:11Z

Overview

Bumps the bundled llama.cpp llama-server sidecar from b9590 to the latest release b9781, and consolidates the engine-packaging documentation into a single source of truth. This is the manual interim bump while #238 (automated bump) is still open.

What changed

Engine pin (scripts/ensure-llama-server.ts): LLAMA_CPP_TAG b9590 → b9781, and ASSET_SHA256 set to the new macOS arm64 asset's hash (read from the GitHub release digest).
CI cache keys (nightly-release, pr-backend-tests, pr-build-validation, release-please): bumped to …-b9781-50e822733750dbc3. Cache keys are immutable, so embedding the pin makes the new asset cache fresh instead of being restored stale from the old key.
Docs: docs/models-and-providers.md → "How the engine binary is packaged" is expanded into the canonical explanation of the pin, the fetch script and its five steps, when it runs (no-op via the stamp file unless the pin changes), and the dev-vs-.app file layout. docs/release-process.md is trimmed to release-specific facts and cross-links to it, removing the duplicated conceptual prose.
Comment (src-tauri/src/openai.rs): reworded the reasoning-kwargs note so it reads as a historical verification datapoint rather than a claim about the current pin.

How it works

The pin is a release tag plus the asset SHA-256. engine:ensure fetches that exact asset, verifies the hash, re-derives the dylib link closure, and ad-hoc re-signs. The closure is unchanged for b9781 (10 dylibs, still matches bundle.macOS.frameworks), so no tauri.conf.json change was needed.

Testing

Verified on Apple Silicon against the actual b9781 binary:

engine:ensure fetches, hash-verifies, and installs cleanly; dylib closure matches the frameworks list.
All spawn-line flags survive in --help (-m --mmproj --ctx-size --host --port --no-webui --parallel).
codesign -vv clean on the binary and all 10 dylibs.
Real spawn + /health + a non-stream completion + SSE streaming (with reasoning deltas) against gpt-oss-20b.
validate-build passes; the bundle re-signs cleanly.

Note: the vision/--mmproj runtime path is unexercised (no vision model installed locally), though the flag is present in the new binary. Reasoning suppression for template-switch families (e.g. Qwen3.5) was verified on b9590 and not re-confirmed on b9781 for lack of a local Qwen model; the kwargs are accepted with no error on b9781.

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>

quiet-node added 2 commits June 24, 2026 12:32

chore(engine): bump bundled llama.cpp sidecar to b9781

2628470

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>

docs: consolidate engine packaging into models-and-providers

a524e56

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>

quiet-node merged commit 43014d3 into main Jun 24, 2026
3 checks passed

quiet-node deleted the claude/vigilant-johnson-e83929 branch June 24, 2026 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(engine): bump bundled llama.cpp sidecar to b9781#253

chore(engine): bump bundled llama.cpp sidecar to b9781#253
quiet-node merged 2 commits into
mainfrom
claude/vigilant-johnson-e83929

quiet-node commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quiet-node commented Jun 24, 2026

Overview

What changed

How it works

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant