Skip to content

Releases: SharpAI/SwiftLM

SwiftLM b648

08 May 04:26
a04b81e

Choose a tag to compare

SwiftLM b648-a04b81e

Merge pull request #104 from roydsouza/fix/moe-memory-and-multimodal-tokens-rebased

Fix: Resolve multimodal BOA/EOA tokens dynamically from config.json

Changelog

  • Potential fix for pull request finding (5cfc277)
  • test(swiftlm): Add tests for multimodal token extraction (621a931)
  • Fix #3: Resolve multimodal BOA/EOA tokens from config.json instead of hardcoding (9d495d9)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b644

04 May 22:54
f1dddb8

Choose a tag to compare

SwiftLM b644-f1dddb8

Merge pull request #101 from SharpAI/fix/qwen3-jinja-template-issue-97

fix: address post-merge PR 99 feedback and tests

Changelog

  • fix: address all 7 Copilot review comments on PR #101 (7870b2f)
  • fix(tests): fix MLXArray init in ContextWindowCalculationTests for Linux CI (a9abb2a)
  • fix: resolve KVCacheSimple cast warning and ContextWindowCalculationTests build error (42f4946)
  • Potential fix for pull request finding (677dd27)
  • fix(swiftbuddy): resolve actor isolation violation in ServerManager (482782e)
  • fix(swiftbuddy): update SettingsView streaming UI and link CLI builder (ccf0b41)
  • test: address Copilot review for Issue 97 by adding strict role mapping regression guards (d280319)
  • test: add missing Context Window, Config Persistence, and Server unit tests (a5bf26a)
  • chore: remove sandbox test scripts (bbedccb)
  • fix(swiftbuddy): fix SettingsView build error and onChange deprecation warning (81c5b95)
  • Fix persisted SSD streaming behavior (321fc21)
  • Add model loading progress for reloads (dcc0a3a)
  • fix: resolve SwiftUI view update crash in SettingsView Color Scheme picker (4ac0c23)
  • fix: address all critical + medium Copilot review comments on PR #99 (cb4c6e4)
  • feat: restore turboKV/streamExperts controls, fix context window label (4332e50)
  • fix(swiftbuddy): resolve buildCLICommand scope error in SettingsView (2cbb836)
  • test: coverage gaps — SwiftBuddy embedded server, CLI builder, removed fields guard (ce2bafd)
  • test: address all 4 Copilot review comments on PR #99 (4d2b858)
  • feat(swiftbuddy): CLI panel, applied toast, seed wiring, remove dead config fields (c360806)
  • feat(swiftbuddy): expose server endpoint URL + regression tests for settings/thinking/API (0304495)
  • feat(swiftbuddy): persist settings, fix thinking mode, fix context count, add /v1/chat/completions (c80cf91)
  • fix(review): address all 4 Copilot review comments on PR #99 (fbd9117)
  • fix(inference): resolve Qwen3 TemplateException on multi-turn chat (Issue #97) (9f9e073)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b618

28 Apr 20:49
0cd94eb

Choose a tag to compare

SwiftLM b618-0cd94eb

Merge pull request #96 from SharpAI/fix/swiftbuddy-model-loading-recovery-main

Harden SwiftBuddy model loading and align local server settings

Changelog

  • Restore MLXLM compatibility (77b258e)
  • Address Copilot review feedback (913ae3f)
  • Bump mlx-swift for quieter Metal compilation (205bbea)
  • Align SwiftBuddy settings with local server config (2cbd2bc)
  • Harden SwiftBuddy model loading recovery (08ceed8)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b612

27 Apr 20:06
dc1cff2

Choose a tag to compare

SwiftLM b612-dc1cff2

test: add ChatRequestParsingTests for tool_calls index mapping (#93)

  • test: add ChatRequestParsingTests covering tool_calls index mapping (PR #92)

  • test: address copilot review - fix stale line refs and malformed JSON schema


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Changelog

  • test: add ChatRequestParsingTests for tool_calls index mapping (#93) (dc1cff2)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b611

27 Apr 15:32
75b4f66

Choose a tag to compare

SwiftLM b611-75b4f66

Refactor tool calls mapping to include index (#92)

Changelog

  • Refactor tool calls mapping to include index (#92) (75b4f66)
  • build: sync submodules to latest main (#90) (0ceaf20)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b609

27 Apr 03:20
d58ef7f

Choose a tag to compare

SwiftLM b609-d58ef7f

Merge pull request #91 from SharpAI/fix/dflash-compiler-warnings

fix(dflash): suppress compiler warnings — remove unused var, var→let

Changelog

  • fix(dflash): suppress compiler warnings — remove unused var, var→let (407e466)
  • test: address Copilot review feedback on PromptCacheTests (c3c1ddb)
  • test: add PromptCache regression tests (PR #85 coverage) (a0147d2)
  • docs(README): remove degenerate DFlash perf row, add honest disclaimer (fea0e11)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b602

26 Apr 16:27
7df2170

Choose a tag to compare

SwiftLM b602-7df2170

fix(server): prompt-cache bleed fixes + Qwen3-A3B perf table (#85)

fix(server): prompt-cache bleed fixes — MambaCache gate + ndim guard + spec-decode ordering

Changelog

  • docs(README): add Qwen3-A3B full-RAM perf table on M1 Ultra 64 GB (5a5b82a)
  • Re-apply prompt-cache bleed fixes to synced main (d38fe8e)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b598

24 Apr 22:41
29f3816

Choose a tag to compare

SwiftLM b598-29f3816

Merge pull request #78 from 0xClandestine/feat/add-dflash

feat: add DFlash speculative decoding

Changelog

  • fix: remove virtual allocation reference from DeepSeek key takeaways (#83) (05d0b6c)
  • fix: README table shows physical RAM, not misleading virtual allocation (#81) (0212b14)
  • feat: DeepSeek-V4 support via mlx-swift-lm b463 (9533e45)
  • fix: prevent Metal GPU Watchdog timeout on low-RAM CI runners (2707be9)
  • fix: cap Metal command buffer size during swap-assisted inference to prevent GPU timeouts (91e32af)
  • fix: strip language_model. prefix, remove stale expert keys, raise FD limit (b5037f6)
  • fix: correct weight key paths for DeepseekV3 and KimiLinear models (d6bcf66)
  • fix: resolve CI GPU timeouts on 7GB runners by fixing Memory limit spin-loops (0e79358)
  • feat: add DeepSeek V3 and Kimi Linear DFlash support (Option B) (313fa91)
  • Revert "fix(ci): skip omni test gracefully when RAM is insufficient" (b224692)
  • fix(ci): skip omni test gracefully when RAM is insufficient (9fc993c)
  • feat: add DFlashTargetModel conformance for Qwen3, Qwen3MoE, and Llama (069a75f)
  • fix: add required log lines to DFlash draft model load path (4c042a6)
  • fix: add 'Using speculative decoding' log line for CI test assertions (5581f38)
  • fix: remove stray banner echo outside SUITE_OPT guard (b7dcd53)
  • fix: suppress interactive menu in sub-process invocations (0dba57a)
  • fix: use SUITE_OPT env var to bypass menu in matrix sub-processes (2d537d6)
  • fix: disable prompt cache for MambaCache hybrid models (Qwen3Next) (5553bf5)
  • chore: move dflash benchmark scripts to profiling dir (fd84f80)
  • fix(benchmark): exit early on DFlash tests to avoid model prompt (7e7ccd1)
  • test(dflash): fix submodule pin and add E2E tests (f629f63)
  • fix: restore DFlashRollbackCache protocol and clean dead extension (60d88e4)
  • chore: bump mlx-swift-lm submodule to b447 (7dcdaf4)
  • docs: add DFlash parameters to README CLI options list (6f0c670)
  • fix(bench): increase server wait timeout to 3600s to allow large model downloads (602f940)
  • fix: address Copilot review on PR #78 (2ea4e96)
  • fix: resolve DFlash protocol conformance and build blockers (a52bd07)
  • refactor(Qwen3Next): move DFlashTargetModel conformance to SwiftLM extension (7d150f9)
  • test: reorganize DFlash test suite into tests/DFlash/ (108f0c2)
  • feat(bench): add JSON result export to bench_35b.sh; add bench_coder_next.sh (0d96a5e)
  • feat: add DFlashKernelBench micro-benchmark target (a2c8102)
  • feat(dflash): add MambaSnapshotCache + dflashUseTapeRollback protocol property (464b959)
  • refactor(dflash/kernels): branchless mask via metal::select + 2D kernel cache (f2ab918)
  • feat: add Qwen3Next SSD streaming + DFlash support (485a929)
  • feat: add bench_35b.sh benchmark script (d6fdef4)
  • feat: add timings (tok/s, token count, duration) to all API responses (9b91b4d)
  • feat: selective safetensors loader — skip expert weight data with SSD streaming (7820436)
  • fix(dflash): load hiddenNorm weight + streaming + prefetch + asyncEval (e1ea48f)
  • feat: add initial dflash implementation (1040e68)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b554

23 Apr 20:31
b33801a

Choose a tag to compare

SwiftLM b554-b33801a

Merge pull request #77 from SharpAI/fix/issue-72-draft-model-ssd-ram

fix: memory auto-cap strategy for SSD MoE streaming + speculative decoding (Issue #72)

Changelog

  • fix: allow custom model selection in benchmark test 10 (8385350)
  • fix: address Copilot review feedback on PR #77 (7b0bfd4)
  • fix(ci): use bash variable for PID in ssd-draft-memory-guard (58249c2)
  • ci: trigger run after YAML fix (c8b236d)
  • fix(ci): repair YAML corruption in ci.yml (retention-days merged with comment) (be8353f)
  • docs: document --stream-experts + --draft-model auto-cap strategy (Issue #72) (bb29e36)
  • ci: add ssd-draft-memory-guard job + vm_stat readings for Issue #72 (3f6bad5)
  • test(benchmark): add Test 10 — Issue #72 SSD + draft model RAM regression (7a14a67)
  • fix(ssd-stream): auto-cap draft tokens to 1 when --stream-experts + --draft-model (#72) (dfd0935)
  • fix(ssd-stream): prevent inference-time swap explosion with --draft-model (#72 follow-up) (5390216)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b543

23 Apr 07:33
336c8a8

Choose a tag to compare

SwiftLM b543-336c8a8

Merge pull request #76 from SharpAI/fix/issue-72-draft-model-ssd-ram

fix(ssd-stream): prevent RAM explosion when --draft-model + --stream-experts combined (#72)

Changelog

  • fix(ssd-stream): address Copilot review on PR #76 (9b0a31c)
  • test(ssd-stream): add regression suite for Issue #72 SSD budget with draft model (8a04b2b)
  • fix(ssd-stream): prevent RAM explosion when --draft-model + --stream-experts are combined (95303a5)
  • chore(agents): document /opt/homebrew/bin/gh path in review-github-pr workflow (975db48)
  • chore(agents): add review-github-pr workflow skill (1005d3e)

Download

Quick Start

For GUI Users (SwiftBuddy):

  1. Download the attached DMG and open it.
  2. Drag SwiftBuddy.app into your Applications folder natively or run directly.
  3. When launched, click "Model Options" to select or download an MLX local model to chat with.

For CLI Users (SwiftLM):
Please refer to the Getting Started section in the README.

Note: mlx.metallib is bundled in the tar archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.