Skip to content

[RFC] Single-archive model distribution format for LLMΒ #17640

@kirklandsign

Description

@kirklandsign

πŸš€ The feature, motivation and pitch

Problem

Deploying an ExecuTorch LLM requires managing multiple separate files:

  • .pte β€” exported program
  • .ptd β€” optional external weights
  • Tokenizer file (tokenizer.json, .model, .bin)
  • Prompt template / chat template β€” not stored anywhere, app must hard-code per model family
  • Generation config (num_bos, num_eos, etc.) β€” caller must set correctly

Some metadata lives in .pte via constant_methods (get_max_seq_len, get_eos_ids, etc.), but tokenizer and chat template are external. Users can easily mismatch tokenizer and model, or use the wrong prompt format.

Proposal

A zip archive (.etm β€” ExecuTorch Model) bundling everything:

model.etm
β”œβ”€β”€ model.pte
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ metadata.json          # chat_template, num_bos, default_temperature, etc.
└── weights/               # optional
    └── foundation.ptd

metadata.json carries what's not in constant_methods today β€” most importantly chat_template, so the runner can format prompts without app-side logic.

Prior Art

  • MediaPipe Tasks (.task): TFLite model + tokenizer + metadata in a single zip-like bundle
  • GGUF (llama.cpp): single file embedding tokenizer vocab and metadata alongside weights
  • Hugging Face: config.json + tokenizer.json + chat_template in a model repo

Open Questions

  1. File extension: .etm? .etb? Just .zip?
  2. LLM-only initially, or general-purpose for all ExecuTorch models?
  3. Chat template format: Jinja2 (HF-compatible) or simpler substitution?
  4. Runtime loading: unzip to temp dir vs. read from zip in-memory?
  5. Keep supporting multi-file approach alongside?

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions