Skip to content

Fix: MoE Memory Budgeting and Dynamic Multimodal Tokens#98

Open
roydsouza wants to merge 2 commits intoSharpAI:mainfrom
roydsouza:fix/moe-memory-and-multimodal-tokens
Open

Fix: MoE Memory Budgeting and Dynamic Multimodal Tokens#98
roydsouza wants to merge 2 commits intoSharpAI:mainfrom
roydsouza:fix/moe-memory-and-multimodal-tokens

Conversation

@roydsouza
Copy link
Copy Markdown

Description

This PR addresses two critical issues identified during an adversarial audit of SwiftLM on Apple Silicon (M5):

  1. MoE-Aware Memory Budgeting: Previously, MoE expert weights were not accounted for in the physical RAM budget when using SSD streaming. This led to 'swap-storms' as active experts were paged in over the limit. This PR adds a 2GB safety buffer for MoE models and includes a high-swap usage monitor.
  2. Dynamic Multimodal Token Resolution: Resolved hardcoded boaToken (255010) and eoaToken (255011) by extracting them dynamically from config.json. This fixes expert routing for non-Qwen multimodal models.

Changes

  • Added moeBuffer to computeSSDMemoryBudget.
  • Integrated sysctl vm.swapusage check in Server.swift.
  • Implemented extractMultimodalTokens in OmniModelFactory to resolve BOA/EOA tokens from config.

Verified on Apple Silicon M5 with Gemma 4 MoE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant