Add a persistent LFM2.5 formatter helper for macOS integrations by seyeong-han · Pull Request #19562 · pytorch/executorch

seyeong-han · 2026-05-13T17:49:49Z

Summary

Follow-up to #19195 (which adds LFM2.5 export + the shared make lfm_2_5-mlx runner target). This PR adds a persistent LFM2.5 formatter helper for macOS integrations.

Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring the parakeet helper introduced in #18861. Wraps an executorch::extension::llm::TextLLMRunner with the same JSON-line stdin/stdout protocol the macOS ExecuWhisper app already uses for the parakeet ASR helper, so the formatter model can stay loaded and KV-warm across requests.

Wire contract (kProtocolVersion=1):

Requests:

{"type":"format",   "version":1, "request_id":..., "prompt":...,
 "max_new_tokens":..., "temperature":...}
{"type":"shutdown", "version":1}

Responses:

{"type":"ready",   "version":1}
{"type":"status",  "version":1, "request_id":..., "phase":..., "message":...}
{"type":"result",  "version":1, "request_id":..., "text":..., "stdout":..., "stderr":...,
 "tokens_per_second":<opt double>}
{"type":"error",   "version":1, "request_id":<opt>, "message":..., "details":<opt>}

The Swift counterpart lives at ExecuWhisper/Services/FormatterHelperProtocol.swift in meta-llama/internal-llama-cookbook (end-to-end-use-cases/ExecuWhisper).

Build

cd ~/executorch
make lfm_2_5_formatter-mlx

Produces:

cmake-out/examples/models/llama/lfm25_formatter_helper
cmake-out/examples/models/llama/mlx.metallib

The new lfm_2_5_formatter-mlx Make target depends on the existing lfm_2_5-mlx target; the llama-mlx CMake build preset's targets list now includes lfm25_formatter_helper alongside llama_main.

Test plan

make -n lfm_2_5_formatter-mlx
make lfm_2_5_formatter-mlx
Helper boots, emits {"type":"ready","version":1}, accepts format requests, returns result payloads with tokens_per_second, exits cleanly on shutdown.
End-to-end via macOS ExecuWhisper app (internal-llama-cookbook/end-to-end-use-cases/ExecuWhisper): formatter helper stays KV-warm across dictation chunks; no per-request reload latency.

pytorch-bot · 2026-05-13T17:49:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19562

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

⏳ 1 Pending, 2 Unrelated Failures

As of commit d93bf75 with merge base 1c11601 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
backends/xnnpack/test/recipes/test_xnnpack_recipes.py::TestXnnpackRecipes::test_8a4w_recipe

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-13T17:50:57Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Add LFM2.5 350M registration, MLX export config, focused regression coverage, and a make target for building the shared Llama C++ runner with MLX. Made-with: Cursor

Point the LFM2 README at the uploaded Hugging Face artifacts so users can run the MLX examples without re-exporting locally. Made-with: Cursor

Long-lived companion process for the LFM2.5-350M MLX formatter, mirroring the parakeet helper introduced in pytorch#18861. Wraps an executorch::extension::llm::TextLLMRunner with the same JSON-line stdin/stdout protocol the macOS ExecuWhisper app already uses for the parakeet ASR helper, so the formatter model can stay loaded and KV-warm across requests. Wire contract (kProtocolVersion=1): Requests: {"type":"format", "version":1, "request_id":..., "prompt":..., "max_new_tokens":..., "temperature":...} {"type":"shutdown", "version":1} Responses: {"type":"ready", "version":1} {"type":"status", "version":1, "request_id":..., "phase":..., "message":...} {"type":"result", "version":1, "request_id":..., "text":..., "stdout":..., "stderr":..., "tokens_per_second":<opt double>} {"type":"error", "version":1, "request_id":<opt>, "message":..., "details":<opt>} The Swift counterpart lives at ExecuWhisper/Services/FormatterHelperProtocol.swift in meta-llama/internal-llama-cookbook (end-to-end-use-cases/ExecuWhisper). Build via the existing make target: cd ~/executorch make lfm_2_5_formatter-mlx which produces: cmake-out/examples/models/llama/lfm25_formatter_helper cmake-out/examples/models/llama/mlx.metallib The new lfm_2_5_formatter-mlx Make target depends on the existing lfm_2_5-mlx target; the llama-mlx CMake build preset's targets list now includes lfm25_formatter_helper alongside llama_main.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2026

seyeong-han added 2 commits May 13, 2026 12:41

Enable LFM2.5 MLX export and runner build

e9ffb31

Add LFM2.5 350M registration, MLX export config, focused regression coverage, and a make target for building the shared Llama C++ runner with MLX. Made-with: Cursor

Document LFM2.5 MLX Hub artifacts

a0963e8

Point the LFM2 README at the uploaded Hugging Face artifacts so users can run the MLX examples without re-exporting locally. Made-with: Cursor

seyeong-han force-pushed the lfm2_5_mlx_formatter branch from ac38c84 to b9b4c54 Compare May 13, 2026 19:44

seyeong-han force-pushed the lfm2_5_mlx_formatter branch from b9b4c54 to d93bf75 Compare May 13, 2026 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a persistent LFM2.5 formatter helper for macOS integrations#19562

Add a persistent LFM2.5 formatter helper for macOS integrations#19562
seyeong-han wants to merge 3 commits into
pytorch:mainfrom
seyeong-han:lfm2_5_mlx_formatter

seyeong-han commented May 13, 2026

Uh oh!

pytorch-bot Bot commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seyeong-han commented May 13, 2026

Summary

Build

Test plan

Uh oh!

pytorch-bot Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19562

❗ 1 Active SEVs

⏳ 1 Pending, 2 Unrelated Failures

Uh oh!

github-actions Bot commented May 13, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented May 13, 2026 •

edited

Loading

This PR needs a `release notes:` label