Fix: MoE Memory Budgeting and Dynamic Multimodal Tokens by roydsouza · Pull Request #98 · SharpAI/SwiftLM

roydsouza · 2026-04-30T14:39:00Z

Description

This PR addresses two critical issues identified during an adversarial audit of SwiftLM on Apple Silicon (M5):

MoE-Aware Memory Budgeting: Previously, MoE expert weights were not accounted for in the physical RAM budget when using SSD streaming. This led to 'swap-storms' as active experts were paged in over the limit. This PR adds a 2GB safety buffer for MoE models and includes a high-swap usage monitor.
Dynamic Multimodal Token Resolution: Resolved hardcoded boaToken (255010) and eoaToken (255011) by extracting them dynamically from config.json. This fixes expert routing for non-Qwen multimodal models.

Added moeBuffer to computeSSDMemoryBudget.
Integrated sysctl vm.swapusage check in Server.swift.
Implemented extractMultimodalTokens in OmniModelFactory to resolve BOA/EOA tokens from config.

Verified on Apple Silicon M5 with Gemma 4 MoE.

…tead of hardcoding

roydsouza added 2 commits April 29, 2026 20:27

Local audit remediations: swap monitoring and MoE memory budgeting

9274a05

Fix SharpAI#3: Resolve multimodal BOA/EOA tokens from config.json ins…

9d04a13

…tead of hardcoding