π The feature, motivation and pitch
Problem
Deploying an ExecuTorch LLM requires managing multiple separate files:
.pte β exported program
.ptd β optional external weights
- Tokenizer file (
tokenizer.json, .model, .bin)
- Prompt template / chat template β not stored anywhere, app must hard-code per model family
- Generation config (
num_bos, num_eos, etc.) β caller must set correctly
Some metadata lives in .pte via constant_methods (get_max_seq_len, get_eos_ids, etc.), but tokenizer and chat template are external. Users can easily mismatch tokenizer and model, or use the wrong prompt format.
Proposal
A zip archive (.etm β ExecuTorch Model) bundling everything:
model.etm
βββ model.pte
βββ tokenizer.json
βββ metadata.json # chat_template, num_bos, default_temperature, etc.
βββ weights/ # optional
βββ foundation.ptd
metadata.json carries what's not in constant_methods today β most importantly chat_template, so the runner can format prompts without app-side logic.
Prior Art
- MediaPipe Tasks (
.task): TFLite model + tokenizer + metadata in a single zip-like bundle
- GGUF (llama.cpp): single file embedding tokenizer vocab and metadata alongside weights
- Hugging Face:
config.json + tokenizer.json + chat_template in a model repo
Open Questions
- File extension:
.etm? .etb? Just .zip?
- LLM-only initially, or general-purpose for all ExecuTorch models?
- Chat template format: Jinja2 (HF-compatible) or simpler substitution?
- Runtime loading: unzip to temp dir vs. read from zip in-memory?
- Keep supporting multi-file approach alongside?
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
π The feature, motivation and pitch
Problem
Deploying an ExecuTorch LLM requires managing multiple separate files:
.pteβ exported program.ptdβ optional external weightstokenizer.json,.model,.bin)num_bos,num_eos, etc.) β caller must set correctlySome metadata lives in
.ptevia constant_methods (get_max_seq_len,get_eos_ids, etc.), but tokenizer and chat template are external. Users can easily mismatch tokenizer and model, or use the wrong prompt format.Proposal
A zip archive (
.etmβ ExecuTorch Model) bundling everything:metadata.jsoncarries what's not inconstant_methodstoday β most importantlychat_template, so the runner can format prompts without app-side logic.Prior Art
.task): TFLite model + tokenizer + metadata in a single zip-like bundleconfig.json+tokenizer.json+chat_templatein a model repoOpen Questions
.etm?.etb? Just.zip?Alternatives
No response
Additional context
No response
RFC (Optional)
No response