Skip to content

[llm][1/4] Add Jinja2Cpp-based chat template formatter library#19533

Open
seyeong-han wants to merge 1 commit into
pytorch:mainfrom
seyeong-han:chat-jinja-library
Open

[llm][1/4] Add Jinja2Cpp-based chat template formatter library#19533
seyeong-han wants to merge 1 commit into
pytorch:mainfrom
seyeong-han:chat-jinja-library

Conversation

@seyeong-han
Copy link
Copy Markdown
Contributor

Summary

Foundation PR for the chat-template support stack split out of #16987 per reviewer request from @kirklandsign. This PR adds the Jinja2Cpp-based JinjaChatFormatter, supporting chat-types, embedded Llama3 / Llama3.2 / Gemma3 templates, build glue (CMake/Buck), and a focused C++ unit-test suite.

This PR is reviewable in isolation — it has no behavior change for any existing runner; downstream PRs (B/C/D) plug it in.

Stack overview

PR Subject
1/4 (this PR) Library + tests
2/4 TextLLMRunner echo-gated special-token filter + EOS merge
3/4 Python bindings + Python LlamaRunner integration
4/4 llama_main CLI flags + chat_formatter wrapper + universal Jinja docs

What this PR adds

  • extension/llm/chat_template/{chat_templates.h, BUCK, CMakeLists.txt, targets.bzl} — embedded Llama3 / Llama3.2 / Gemma3 templates and the ChatTemplateType enum + ModelTokens. The CMake file FetchContents Jinja2Cpp 1.3.2, with SUPPORT_REGEX_LOOKAHEAD set before FetchContent_MakeAvailable so it propagates correctly, plus header staging for nonstd headers that some Jinja2Cpp installations omit. Installs chat_templates.h so SDK consumers can include it.
  • extension/llm/runner/{chat_types.h, jinja_chat_formatter.{h,cpp}} — the Universal Jinja chat formatter that supports any HuggingFace / vLLM chat template, not just the embedded ones. Loadable via fromTemplate (built-in), fromString (any string), or fromFile (any .jinja file). formatConversation injects vLLM/HuggingFace-standard params (tools=[], tool_choice=None, date_string, chat_template_kwargs) so any template that references those variables renders correctly.
  • normalizeTemplate handles vLLM/HF template quirks for Jinja2Cpp: notably, not tools is none maps to tools (truthy check), preserving the intent of tools is not none for empty-list defaults.
  • extension/llm/runner/{CMakeLists.txt, targets.bzl} — link extension_llm_runner against jinja2cpp (PRIVATE) and define EXECUTORCH_USE_JINJA2CPP.
  • extension/llm/runner/test/test_jinja_chat_formatter.cpp + test build files — unit tests covering Llama3 / Llama3.2 / Gemma3 embedded templates, parseChatTemplateType (case-insensitive), and three universal-Jinja regression tests:
    • generic HuggingFace-style template (proves it's not Llama-specific)
    • tools-aware template (validates the tools=[] default)
    • not tools is none normalization regression test
  • CMakeLists.txt — adds add_subdirectory(extension/llm/chat_template) guarded by EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER.
  • shim_et/xplat/executorch/build/build_variables.bzl — adds jinja_chat_formatter.cpp to the runner sources.

Universal Jinja support

Any HuggingFace / vLLM-style Jinja template works:

// From a template file (HuggingFace tokenizer_config.json, vLLM examples)
auto formatter = JinjaChatFormatter::fromFile("path/to/template.jinja");

// From a string
auto formatter = JinjaChatFormatter::fromString(template_str);

// Built-in shortcuts
auto formatter = JinjaChatFormatter::fromTemplate(ChatTemplateType::Llama3);

Notes

  • No behavior change for existing TextLLMRunner / MultimodalRunner users: the formatter is opt-in, only invoked when downstream code calls it.
  • Sample vLLM templates are NOT checked in (per reviewer feedback @metascroy). Documentation in the follow-up CLI PR (4/4) points users to vLLM's examples/ directory and HuggingFace tokenizer_config.json files.

Test Plan

  • Build with cmake --workflow llm-release
  • Build with make llama-cpu
  • Run unit tests: extension/llm/runner/test/test_jinja_chat_formatter
  • Verify embedded templates render correctly for Llama3 / Llama3.2 / Gemma3
  • Verify universal Jinja support with HuggingFace tokenizer_config.json templates

Original PR

Splitting #16987 into 4 reviewable PRs.

cc @kirklandsign @larryliu0820 @metascroy @lucylq @mergennachin

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19533

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 103 New Failures, 1 Cancelled Job, 1 Unrelated Failure, 6 Unclassified Failures

As of commit 0898aa3 with merge base 2ea50ac (image):

NEW FAILURES - The following jobs have failed:

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Foundation PR for the chat-template support stack. Adds the Jinja2Cpp-based
JinjaChatFormatter, supporting chat-types, embedded Llama3/Llama3.2/Gemma3
templates, build glue (CMake/Buck), and a focused C++ unit-test suite.
This PR is reviewable in isolation — it has no behavior change for any
existing runner; downstream PRs (B/C/D) plug it in.

This is part 1 of a 4-PR stack split out of pytorch#16987 per reviewer request:

  1/4 (this PR)  Library + tests
  2/4            TextLLMRunner echo-gated special-token filter + EOS merge
  3/4            Python bindings + Python LlamaRunner integration
  4/4            llama_main CLI flags + chat_formatter wrapper + docs

What this PR adds
-----------------
* extension/llm/chat_template/{chat_templates.h, BUCK, CMakeLists.txt,
  targets.bzl} — embedded Llama3/Llama3.2/Gemma3 templates and the
  ChatTemplateType enum + ModelTokens. The CMake file FetchContent's
  Jinja2Cpp 1.3.2, with SUPPORT_REGEX_LOOKAHEAD set BEFORE
  FetchContent_MakeAvailable so it propagates correctly, plus header
  staging for nonstd headers that some Jinja2Cpp installations omit.
  Installs chat_templates.h so SDK consumers can include it.
* extension/llm/runner/{chat_types.h, jinja_chat_formatter.{h,cpp}} — the
  Universal Jinja chat formatter that supports any HuggingFace / vLLM
  chat template, not just the embedded ones. Loadable via fromTemplate
  (built-in), fromString (any string), or fromFile (any .jinja file).
  formatConversation injects vLLM/HuggingFace-standard params (tools=[],
  tool_choice=None, date_string, chat_template_kwargs) so any template
  that references those variables renders correctly.
* normalizeTemplate handles vLLM/HF template quirks for Jinja2Cpp:
  notably, 'not tools is none' maps to 'tools' (truthy check), preserving
  the intent of 'tools is not none' for empty-list defaults.
* extension/llm/runner/{CMakeLists.txt, targets.bzl} — link
  extension_llm_runner against jinja2cpp (PRIVATE) and define
  EXECUTORCH_USE_JINJA2CPP.
* extension/llm/runner/test/{test_jinja_chat_formatter.cpp, CMakeLists.txt,
  targets.bzl, BUCK} — unit tests covering Llama3 / Llama3.2 / Gemma3
  embedded templates, parseChatTemplateType (case-insensitive), and
  three universal-Jinja regression tests:
    - generic HuggingFace-style template (proves it's not Llama-specific)
    - tools-aware template (validates the tools=[] default)
    - 'not tools is none' normalization regression test
* CMakeLists.txt — adds add_subdirectory(extension/llm/chat_template)
  guarded by EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER.
* shim_et/xplat/executorch/build/build_variables.bzl — adds
  jinja_chat_formatter.cpp to the runner sources.

Notes
-----
* No behavior change for existing TextLLMRunner / MultimodalRunner users:
  the formatter is opt-in, only invoked when downstream code calls it.
* Sample vLLM templates are NOT checked in (per reviewer feedback);
  documentation in the follow-up CLI PR points users to vLLM's examples
  directory and HuggingFace tokenizer_config.json files.

Original PR (full stack): pytorch#16987
@seyeong-han seyeong-han force-pushed the chat-jinja-library branch from e5c5a56 to 0898aa3 Compare May 13, 2026 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant