Add a persistent LFM2.5 formatter helper for macOS integrations#19562
Add a persistent LFM2.5 formatter helper for macOS integrations#19562seyeong-han wants to merge 3 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19562
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ⏳ 1 Pending, 2 Unrelated FailuresAs of commit d93bf75 with merge base 1c11601 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Add LFM2.5 350M registration, MLX export config, focused regression coverage, and a make target for building the shared Llama C++ runner with MLX. Made-with: Cursor
Point the LFM2 README at the uploaded Hugging Face artifacts so users can run the MLX examples without re-exporting locally. Made-with: Cursor
ac38c84 to
b9b4c54
Compare
Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring the parakeet helper introduced in pytorch#18861. Wraps an executorch::extension::llm::TextLLMRunner with the same JSON-line stdin/stdout protocol the macOS ExecuWhisper app already uses for the parakeet ASR helper, so the formatter model can stay loaded and KV-warm across requests. Wire contract (kProtocolVersion=1): Requests: {"type":"format", "version":1, "request_id":..., "prompt":..., "max_new_tokens":..., "temperature":...} {"type":"shutdown", "version":1} Responses: {"type":"ready", "version":1} {"type":"status", "version":1, "request_id":..., "phase":..., "message":...} {"type":"result", "version":1, "request_id":..., "text":..., "stdout":..., "stderr":..., "tokens_per_second":<opt double>} {"type":"error", "version":1, "request_id":<opt>, "message":..., "details":<opt>} The Swift counterpart lives at ExecuWhisper/Services/FormatterHelperProtocol.swift in meta-llama/internal-llama-cookbook (end-to-end-use-cases/ExecuWhisper). Build via the existing make target: cd ~/executorch make lfm_2_5_formatter-mlx which produces: cmake-out/examples/models/llama/lfm25_formatter_helper cmake-out/examples/models/llama/mlx.metallib The new lfm_2_5_formatter-mlx Make target depends on the existing lfm_2_5-mlx target; the llama-mlx CMake build preset's targets list now includes lfm25_formatter_helper alongside llama_main.
b9b4c54 to
d93bf75
Compare
Summary
Follow-up to #19195 (which adds LFM2.5 export + the shared
make lfm_2_5-mlxrunner target). This PR adds a persistent LFM2.5 formatter helper for macOS integrations.Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring the parakeet helper introduced in #18861. Wraps an
executorch::extension::llm::TextLLMRunnerwith the same JSON-line stdin/stdout protocol the macOS ExecuWhisper app already uses for the parakeet ASR helper, so the formatter model can stay loaded and KV-warm across requests.Wire contract (
kProtocolVersion=1):Requests:
{"type":"format", "version":1, "request_id":..., "prompt":..., "max_new_tokens":..., "temperature":...} {"type":"shutdown", "version":1}Responses:
{"type":"ready", "version":1} {"type":"status", "version":1, "request_id":..., "phase":..., "message":...} {"type":"result", "version":1, "request_id":..., "text":..., "stdout":..., "stderr":..., "tokens_per_second":<opt double>} {"type":"error", "version":1, "request_id":<opt>, "message":..., "details":<opt>}The Swift counterpart lives at
ExecuWhisper/Services/FormatterHelperProtocol.swiftinmeta-llama/internal-llama-cookbook(end-to-end-use-cases/ExecuWhisper).Build
Produces:
The new
lfm_2_5_formatter-mlxMake target depends on the existinglfm_2_5-mlxtarget; thellama-mlxCMake build preset's targets list now includeslfm25_formatter_helperalongsidellama_main.Test plan
make -n lfm_2_5_formatter-mlxmake lfm_2_5_formatter-mlx{"type":"ready","version":1}, acceptsformatrequests, returnsresultpayloads withtokens_per_second, exits cleanly onshutdown.internal-llama-cookbook/end-to-end-use-cases/ExecuWhisper): formatter helper stays KV-warm across dictation chunks; no per-request reload latency.