Support offline SpeechLM2 HF exports by pzelasko · Pull Request #15736 · NVIDIA-NeMo/NeMo

pzelasko · 2026-05-29T15:49:13Z

What changed

This is an alternative implementation for the offline-loading goal in #15695.

It keeps the root checkpoint config.json as the NeMo/SpeechLM wrapper config and saves the original LLM HF config under llm_backbone/config.json to avoid a config.json filename conflict. The tokenizer remains saved once at the checkpoint root for vLLM compatibility.

At load time, HFHubMixin detects the local export layout and redirects:

root tokenizer files -> cfg["tokenizer_path"]
llm_backbone/config.json -> cfg["pretrained_llm"] or cfg["pretrained_lm_name"]

This keeps old checkpoints backward compatible because configs without these local files are left unchanged.

Compatibility coverage

SALM
SALMAutomodel
SALMWithAsrDecoder
Duplex S2S
Duplex S2S speech decoder
Duplex STT
Duplex EAR-TTS

Testing

python -m py_compile examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_hf_hub.py tests/collections/speechlm2/test_to_hf.py
python -m black --check examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py
python -m isort --check-only examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py
pytest tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py -q

copy-pr-bot · 2026-05-29T15:49:17Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

pzelasko · 2026-05-29T15:55:34Z

/ok to test f33caf8

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

pzelasko · 2026-05-29T16:37:39Z

/ok to test a8d8276

github-actions · 2026-05-29T17:36:11Z

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

AudranBert · 2026-06-01T12:39:20Z

Hi @pzelasko,
Thanks for the PR!
I don't know if it is because the checkpoint is from an earlier version, but I got:

ValueError: Unable to instantiate HuggingFace AUTOTOKENIZER for /home/abert/salm/training_luciole_1b_v3_buckets/hf_checkpoints/ckpt. Exception: module 'torch' has no attribute 'bf16'

when trying to infer using the exported checkpoint.

It happens because I have in the exp_config.yaml :

model:
  dtype: bf16

And this dtype is carried over in the HF exported checkpoint config.json :

{
  "architectures": [
    "NeMoSpeechLMForConditionalGeneration"
  ],
  "audio_locator_tag": "<|audio|>",
  "dtype": "bf16",
  ...
  "torch_dtype": "bfloat16"
  }

Even though it prints torch_dtype is deprecated! Use dtype instead! when exporting and infering, it uses dtype and not torch_dtype.
If I put dtype: bfloat16 it does work and I can use the model fully offline after exporting it to HF format.

pzelasko · 2026-06-01T15:35:03Z

@AudranBert dtype: bf16 must fail because the code uses sth like getattr(torch, model_cfg.dtype), there is no torch.bf16. bfloat16 makes sense to me.

Can you confirm the current PR works as you expect it to, or do we need to change anything?

AudranBert · 2026-06-01T16:08:18Z

@AudranBert dtype: bf16 must fail because the code uses sth like getattr(torch, model_cfg.dtype), there is no torch.bf16. bfloat16 makes sense to me.

Can you confirm the current PR works as you expect it to, or do we need to change anything?

Hi @pzelasko,
Except for that error, it works as I expected. To come back to the error, shouldn't the to_hf.py writes "dtype": "bfoat16" instead of "dtype": "bf16" to avoid the error later?

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

pzelasko · 2026-06-01T16:35:34Z

Thanks, now I understand the issue better. I pushed a compatibility fix. Can you check now?

pzelasko · 2026-06-01T16:35:51Z

/ok to test 6c040ec

AudranBert · 2026-06-02T08:04:34Z

Thanks, now I understand the issue better. I pushed a compatibility fix. Can you check now?

Hi, thanks for the quick fix,
It does work correctly now!

pzelasko · 2026-06-02T12:53:35Z

/ok to test 325b315

github-actions · 2026-06-02T15:29:25Z

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Edresson

LGTM.

swamlife · 2026-06-02T22:44:47Z

This is great news. Very very happy about this.

…

On Fri, May 29, 2026 at 10:50 AM Piotr Żelasko ***@***.***> wrote: What changed This is an alternative implementation for the offline-loading goal in #15695 <#15695>. It keeps the root checkpoint config.json as the NeMo/SpeechLM wrapper config and saves the original LLM HF config under llm_backbone/config.json to avoid a config.json filename conflict. The tokenizer remains saved once at the checkpoint root for vLLM compatibility. At load time, HFHubMixin detects the local export layout and redirects: - root tokenizer files -> cfg["tokenizer_path"] - llm_backbone/config.json -> cfg["pretrained_llm"] or cfg["pretrained_lm_name"] This keeps old checkpoints backward compatible because configs without these local files are left unchanged. Compatibility coverage - SALM - SALMAutomodel - SALMWithAsrDecoder - Duplex S2S - Duplex S2S speech decoder - Duplex STT - Duplex EAR-TTS Testing - python -m py_compile examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_hf_hub.py tests/collections/speechlm2/test_to_hf.py - python -m black --check examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py - python -m isort --check-only examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py - pytest tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py -q ruff was not available in the local environment used for preparation. ------------------------------ You can view, comment on, or merge this pull request online at: #15736 Commit Summary - f33caf8 <f33caf8> Support offline SpeechLM2 HF exports File Changes (11 files <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files>) - *M* examples/speechlm2/to_hf.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-94882378465e78570027610a1a986341f514310bd04ec13cf25534d12cc7e175> (20) - *M* nemo/collections/speechlm2/models/duplex_ear_tts.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f703395f9b626306138b2efd16cc6cec29f1f09c7edccbaf23564298abfbd660> (3) - *M* nemo/collections/speechlm2/models/duplex_s2s_model.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-7d78ee075191366776b50438aefac0d86643eaf3bc2216c99ca72d3712615fbc> (3) - *M* nemo/collections/speechlm2/models/duplex_s2s_speech_decoder_model.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-d624b0a0ccbcb88b2cf24c872d4bc8ec8bbffb2f6136777dee94f76c6ec54052> (3) - *M* nemo/collections/speechlm2/models/duplex_stt_model.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f394ae5c2f64ac904aec9bec0c4614a468765ddb41780ecd8c0071a0c3c2b301> (3) - *M* nemo/collections/speechlm2/models/salm.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-b7d1adcccd77921de8e8754147f388c8172675e0c353d2d08c87eed1a2108ad6> (3) - *M* nemo/collections/speechlm2/models/salm_asr_decoder.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-60dbdac2b06d0a3ecaf69e08be0e8139fd01104b53302cf8de55cab27c4d8b97> (3) - *M* nemo/collections/speechlm2/models/salm_automodel.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-4fbe3df86f69a99c6c778f8869e53cd23211d48bc30bbb8b23fe9ff3dbb539a8> (3) - *M* nemo/collections/speechlm2/parts/hf_hub.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-eef3597a51882b3556f35490caee3c1d5a4cc0b8711d81a8abdc5f488bb18edd> (27) - *A* tests/collections/speechlm2/test_hf_hub.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-fc5b99966d7b9773f13221478dc29ec3427c279fe696b1c6ebc352035ab521f4> (75) - *M* tests/collections/speechlm2/test_to_hf.py <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-e9c36b11e45a25c7819a121fb127424f33a7c2639f9d086304d2bcde861b8c0a> (47) Patch Links: - https://github.com/NVIDIA-NeMo/NeMo/pull/15736.patch - https://github.com/NVIDIA-NeMo/NeMo/pull/15736.diff — Reply to this email directly, view it on GitHub <#15736?email_source=notifications&email_token=BRUUKTDMZ2ATGFUZZOS5TIT45GWULA5CNFSNUABEM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UF4ZTONZQGAYTCNJTGGTHEZLBONXW5KTTOVRHGY3SNFRGKZFFMV3GK3TUVRTG633UMVZF6Y3MNFRWW>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BRUUKTAZHR5RAQLCIWGOK3T45GWULAVCNFSM6AAAAACZSU27L6VHI2DSMVQWIX3LMV43ASLTON2WKOZUGU2DSNZVGE4DAMY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

swamlife · 2026-06-02T22:45:17Z

😜

…

On Tue, Jun 2, 2026 at 5:44 PM JAM I AM ***@***.***> wrote: This is great news. Very very happy about this. On Fri, May 29, 2026 at 10:50 AM Piotr Żelasko ***@***.***> wrote: > What changed > > This is an alternative implementation for the offline-loading goal in > #15695 <#15695>. > > It keeps the root checkpoint config.json as the NeMo/SpeechLM wrapper > config and saves the original LLM HF config under > llm_backbone/config.json to avoid a config.json filename conflict. The > tokenizer remains saved once at the checkpoint root for vLLM compatibility. > > At load time, HFHubMixin detects the local export layout and redirects: > > - root tokenizer files -> cfg["tokenizer_path"] > - llm_backbone/config.json -> cfg["pretrained_llm"] or > cfg["pretrained_lm_name"] > > This keeps old checkpoints backward compatible because configs without > these local files are left unchanged. > Compatibility coverage > > - SALM > - SALMAutomodel > - SALMWithAsrDecoder > - Duplex S2S > - Duplex S2S speech decoder > - Duplex STT > - Duplex EAR-TTS > > Testing > > - python -m py_compile examples/speechlm2/to_hf.py > nemo/collections/speechlm2/parts/hf_hub.py > tests/collections/speechlm2/test_hf_hub.py > tests/collections/speechlm2/test_to_hf.py > - python -m black --check examples/speechlm2/to_hf.py > nemo/collections/speechlm2/parts/hf_hub.py > tests/collections/speechlm2/test_to_hf.py > tests/collections/speechlm2/test_hf_hub.py > - python -m isort --check-only examples/speechlm2/to_hf.py > nemo/collections/speechlm2/parts/hf_hub.py > tests/collections/speechlm2/test_to_hf.py > tests/collections/speechlm2/test_hf_hub.py > - pytest tests/collections/speechlm2/test_to_hf.py > tests/collections/speechlm2/test_hf_hub.py -q > > ruff was not available in the local environment used for preparation. > ------------------------------ > You can view, comment on, or merge this pull request online at: > > #15736 > Commit Summary > > - f33caf8 > <f33caf8> > Support offline SpeechLM2 HF exports > > File Changes > > (11 files <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files>) > > - *M* examples/speechlm2/to_hf.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-94882378465e78570027610a1a986341f514310bd04ec13cf25534d12cc7e175> > (20) > - *M* nemo/collections/speechlm2/models/duplex_ear_tts.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f703395f9b626306138b2efd16cc6cec29f1f09c7edccbaf23564298abfbd660> > (3) > - *M* nemo/collections/speechlm2/models/duplex_s2s_model.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-7d78ee075191366776b50438aefac0d86643eaf3bc2216c99ca72d3712615fbc> > (3) > - *M* > nemo/collections/speechlm2/models/duplex_s2s_speech_decoder_model.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-d624b0a0ccbcb88b2cf24c872d4bc8ec8bbffb2f6136777dee94f76c6ec54052> > (3) > - *M* nemo/collections/speechlm2/models/duplex_stt_model.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f394ae5c2f64ac904aec9bec0c4614a468765ddb41780ecd8c0071a0c3c2b301> > (3) > - *M* nemo/collections/speechlm2/models/salm.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-b7d1adcccd77921de8e8754147f388c8172675e0c353d2d08c87eed1a2108ad6> > (3) > - *M* nemo/collections/speechlm2/models/salm_asr_decoder.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-60dbdac2b06d0a3ecaf69e08be0e8139fd01104b53302cf8de55cab27c4d8b97> > (3) > - *M* nemo/collections/speechlm2/models/salm_automodel.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-4fbe3df86f69a99c6c778f8869e53cd23211d48bc30bbb8b23fe9ff3dbb539a8> > (3) > - *M* nemo/collections/speechlm2/parts/hf_hub.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-eef3597a51882b3556f35490caee3c1d5a4cc0b8711d81a8abdc5f488bb18edd> > (27) > - *A* tests/collections/speechlm2/test_hf_hub.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-fc5b99966d7b9773f13221478dc29ec3427c279fe696b1c6ebc352035ab521f4> > (75) > - *M* tests/collections/speechlm2/test_to_hf.py > <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-e9c36b11e45a25c7819a121fb127424f33a7c2639f9d086304d2bcde861b8c0a> > (47) > > Patch Links: > > - https://github.com/NVIDIA-NeMo/NeMo/pull/15736.patch > - https://github.com/NVIDIA-NeMo/NeMo/pull/15736.diff > > — > Reply to this email directly, view it on GitHub > <#15736?email_source=notifications&email_token=BRUUKTDMZ2ATGFUZZOS5TIT45GWULA5CNFSNUABEM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UF4ZTONZQGAYTCNJTGGTHEZLBONXW5KTTOVRHGY3SNFRGKZFFMV3GK3TUVRTG633UMVZF6Y3MNFRWW>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BRUUKTAZHR5RAQLCIWGOK3T45GWULAVCNFSM6AAAAACZSU27L6VHI2DSMVQWIX3LMV43ASLTON2WKOZUGU2DSNZVGE4DAMY> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

pzelasko requested review from DongjiGao and Edresson May 29, 2026 15:54

copy-pr-bot Bot temporarily deployed to public May 29, 2026 15:56 Inactive

pzelasko mentioned this pull request May 29, 2026

[SpeechLM2] Load tokenizer from checkpoint #15695

Closed

8 tasks

copy-pr-bot Bot temporarily deployed to public May 29, 2026 16:00 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 16:01 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 16:04 Inactive

pzelasko force-pushed the codex/speechlm2-offline-backbone branch from f33caf8 to 8be1335 Compare May 29, 2026 16:16

pzelasko added the Run CICD label May 29, 2026

Support offline SpeechLM2 HF exports

a8d8276

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

pzelasko force-pushed the codex/speechlm2-offline-backbone branch from 8be1335 to a8d8276 Compare May 29, 2026 16:19

copy-pr-bot Bot temporarily deployed to public May 29, 2026 16:38 Inactive

copy-pr-bot Bot temporarily deployed to test May 29, 2026 16:39 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 16:42 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 16:46 Inactive

Normalize SpeechLM2 HF export dtype

6c040ec

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

pzelasko removed the Run CICD label Jun 1, 2026

copy-pr-bot Bot temporarily deployed to public June 1, 2026 16:36 Inactive

copy-pr-bot Bot temporarily deployed to test June 1, 2026 16:38 Inactive

copy-pr-bot Bot temporarily deployed to public June 1, 2026 16:40 Inactive

copy-pr-bot Bot temporarily deployed to public June 1, 2026 16:41 Inactive

copy-pr-bot Bot temporarily deployed to public June 1, 2026 16:44 Inactive

Merge branch 'main' into codex/speechlm2-offline-backbone

325b315

copy-pr-bot Bot temporarily deployed to public June 2, 2026 12:54 Inactive

copy-pr-bot Bot temporarily deployed to test June 2, 2026 12:55 Inactive

copy-pr-bot Bot temporarily deployed to public June 2, 2026 12:58 Inactive

copy-pr-bot Bot temporarily deployed to public June 2, 2026 13:18 Inactive

Edresson approved these changes Jun 2, 2026

View reviewed changes

pzelasko merged commit 05e8c54 into main Jun 2, 2026
166 checks passed

pzelasko deleted the codex/speechlm2-offline-backbone branch June 2, 2026 20:57

Conversation

pzelasko commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Compatibility coverage

Testing

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

pzelasko commented May 29, 2026

Uh oh!

pzelasko commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

AudranBert commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pzelasko commented Jun 1, 2026

Uh oh!

AudranBert commented Jun 1, 2026

Uh oh!

pzelasko commented Jun 1, 2026

Uh oh!

pzelasko commented Jun 1, 2026

Uh oh!

AudranBert commented Jun 2, 2026

Uh oh!

pzelasko commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Edresson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

swamlife commented Jun 2, 2026 via email

Uh oh!

swamlife commented Jun 2, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pzelasko commented May 29, 2026 •

edited

Loading

AudranBert commented Jun 1, 2026 •

edited

Loading