Skip to content

Model fails to load when chat template uses HuggingFace generation tags #2225

@tobocop2

Description

@tobocop2

Prerequisites

  • I am running the latest code.
  • I carefully followed the README.md.
  • I searched issues and discussions; no existing issue covers this (PR Implement GenerationTagIgnore Jinja2 extension #2082 attempts a fix but is stale and has a missing import).
  • I reviewed the Discussions.

Expected Behavior

Llama(model_path=...) should successfully load a GGUF whose embedded tokenizer.chat_template contains HuggingFace's {% generation %} / {% endgeneration %} Jinja tags (e.g. SmolLM3, and any future HF-shipped model adopting the same template extension), even when the caller intends to pass an explicit chat_format override.

Current Behavior

Llama.__init__ raises jinja2.exceptions.TemplateSyntaxError: Encountered unknown tag 'generation' before the model is usable. The error fires during Jinja2ChatFormatter.__init__, which eagerly compiles every chat template found in GGUF metadata regardless of whether the caller will use it.

The {% generation %} tag is a HuggingFace transformers chat-template extension that marks training-time generation spans for loss masking. It has no inference-time meaning, but jinja2's default environment doesn't recognize it.

Environment and Context

  • Hardware: Apple M1 Pro, 16 GB
  • OS: macOS 14.6 (Darwin 23.6.0)
  • Python: 3.12.9
  • llama-cpp-python: main (commit current as of 2025-05-21)
  • jinja2: 3.x

Failure Information

jinja2.exceptions.TemplateSyntaxError: Encountered unknown tag 'generation'.
Jinja was looking for the following tags: 'elif' or 'else' or 'endif'.
The innermost block that needs to be closed is 'if'.

Steps to Reproduce

  1. pip install llama-cpp-python
  2. huggingface-cli download bartowski/HuggingFaceTB_SmolLM3-3B-GGUF HuggingFaceTB_SmolLM3-3B-Q4_K_M.gguf
  3. python -c "from llama_cpp import Llama; Llama(model_path='./HuggingFaceTB_SmolLM3-3B-Q4_K_M.gguf', chat_format='chatml')"

The chat_format='chatml' override is intentionally provided to show the failure occurs even when the embedded template would be bypassed: the template is compiled at init regardless.

Related

A complete fix is available; will open a PR shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions