[RFC] Single-archive model distribution format for LLM

### 🚀 The feature, motivation and pitch


## Problem

Deploying an ExecuTorch LLM requires managing multiple separate files:

- `.pte` — exported program
- `.ptd` — optional external weights
- Tokenizer file (`tokenizer.json`, `.model`, `.bin`)
- Prompt template / chat template — not stored anywhere, app must hard-code per model family
- Generation config (`num_bos`, `num_eos`, etc.) — caller must set correctly

Some metadata lives in `.pte` via [constant_methods](https://github.com/pytorch/executorch/blob/bf0cd1c31acb1f8e29cc30e6952165707ab3e7a8/extension/llm/runner/constants.h#L12-L20) (`get_max_seq_len`, `get_eos_ids`, etc.), but tokenizer and chat template are external. Users can easily mismatch tokenizer and model, or use the wrong prompt format.

## Proposal

A zip archive (`.etm` — ExecuTorch Model) bundling everything:

```
model.etm
├── model.pte
├── tokenizer.json
├── metadata.json          # chat_template, num_bos, default_temperature, etc.
└── weights/               # optional
    └── foundation.ptd
```

`metadata.json` carries what's **not** in `constant_methods` today — most importantly `chat_template`, so the runner can format prompts without app-side logic.

## Prior Art

- **MediaPipe Tasks** (`.task`): TFLite model + tokenizer + metadata in a single zip-like bundle
- **GGUF** (llama.cpp): single file embedding tokenizer vocab and metadata alongside weights
- **Hugging Face**: `config.json` + `tokenizer.json` + `chat_template` in a model repo

## Open Questions

1. File extension: `.etm`? `.etb`? Just `.zip`?
2. LLM-only initially, or general-purpose for all ExecuTorch models?
3. Chat template format: Jinja2 (HF-compatible) or simpler substitution?
4. Runtime loading: unzip to temp dir vs. read from zip in-memory?
5. Keep supporting multi-file approach alongside?


### Alternatives

_No response_

### Additional context

_No response_

### RFC (Optional)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Single-archive model distribution format for LLM #17640

🚀 The feature, motivation and pitch

Problem

Proposal

Prior Art

Open Questions

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Single-archive model distribution format for LLM #17640

Description

🚀 The feature, motivation and pitch

Problem

Proposal

Prior Art

Open Questions

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions