Skip to content

Support offline SpeechLM2 HF exports#15736

Merged
pzelasko merged 3 commits into
mainfrom
codex/speechlm2-offline-backbone
Jun 2, 2026
Merged

Support offline SpeechLM2 HF exports#15736
pzelasko merged 3 commits into
mainfrom
codex/speechlm2-offline-backbone

Conversation

@pzelasko
Copy link
Copy Markdown
Collaborator

@pzelasko pzelasko commented May 29, 2026

What changed

This is an alternative implementation for the offline-loading goal in #15695.

It keeps the root checkpoint config.json as the NeMo/SpeechLM wrapper config and saves the original LLM HF config under llm_backbone/config.json to avoid a config.json filename conflict. The tokenizer remains saved once at the checkpoint root for vLLM compatibility.

At load time, HFHubMixin detects the local export layout and redirects:

  • root tokenizer files -> cfg["tokenizer_path"]
  • llm_backbone/config.json -> cfg["pretrained_llm"] or cfg["pretrained_lm_name"]

This keeps old checkpoints backward compatible because configs without these local files are left unchanged.

Compatibility coverage

  • SALM
  • SALMAutomodel
  • SALMWithAsrDecoder
  • Duplex S2S
  • Duplex S2S speech decoder
  • Duplex STT
  • Duplex EAR-TTS

Testing

  • python -m py_compile examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_hf_hub.py tests/collections/speechlm2/test_to_hf.py
  • python -m black --check examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py
  • python -m isort --check-only examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py
  • pytest tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py -q

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pzelasko pzelasko requested review from DongjiGao and Edresson May 29, 2026 15:54
@pzelasko
Copy link
Copy Markdown
Collaborator Author

/ok to test f33caf8

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
@pzelasko pzelasko force-pushed the codex/speechlm2-offline-backbone branch from 8be1335 to a8d8276 Compare May 29, 2026 16:19
@pzelasko
Copy link
Copy Markdown
Collaborator Author

/ok to test a8d8276

@github-actions
Copy link
Copy Markdown
Contributor

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

@AudranBert
Copy link
Copy Markdown
Contributor

AudranBert commented Jun 1, 2026

Hi @pzelasko,
Thanks for the PR!
I don't know if it is because the checkpoint is from an earlier version, but I got:

ValueError: Unable to instantiate HuggingFace AUTOTOKENIZER for /home/abert/salm/training_luciole_1b_v3_buckets/hf_checkpoints/ckpt. Exception: module 'torch' has no attribute 'bf16'

when trying to infer using the exported checkpoint.

It happens because I have in the exp_config.yaml :

model:
  dtype: bf16

And this dtype is carried over in the HF exported checkpoint config.json :

{
  "architectures": [
    "NeMoSpeechLMForConditionalGeneration"
  ],
  "audio_locator_tag": "<|audio|>",
  "dtype": "bf16",
  ...
  "torch_dtype": "bfloat16"
  }

Even though it prints torch_dtype is deprecated! Use dtype instead! when exporting and infering, it uses dtype and not torch_dtype.
If I put dtype: bfloat16 it does work and I can use the model fully offline after exporting it to HF format.

@pzelasko
Copy link
Copy Markdown
Collaborator Author

pzelasko commented Jun 1, 2026

@AudranBert dtype: bf16 must fail because the code uses sth like getattr(torch, model_cfg.dtype), there is no torch.bf16. bfloat16 makes sense to me.

Can you confirm the current PR works as you expect it to, or do we need to change anything?

@AudranBert
Copy link
Copy Markdown
Contributor

@AudranBert dtype: bf16 must fail because the code uses sth like getattr(torch, model_cfg.dtype), there is no torch.bf16. bfloat16 makes sense to me.

Can you confirm the current PR works as you expect it to, or do we need to change anything?

Hi @pzelasko,
Except for that error, it works as I expected. To come back to the error, shouldn't the to_hf.py writes "dtype": "bfoat16" instead of "dtype": "bf16" to avoid the error later?

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
@pzelasko
Copy link
Copy Markdown
Collaborator Author

pzelasko commented Jun 1, 2026

Thanks, now I understand the issue better. I pushed a compatibility fix. Can you check now?

@pzelasko pzelasko removed the Run CICD label Jun 1, 2026
@pzelasko
Copy link
Copy Markdown
Collaborator Author

pzelasko commented Jun 1, 2026

/ok to test 6c040ec

@AudranBert
Copy link
Copy Markdown
Contributor

Thanks, now I understand the issue better. I pushed a compatibility fix. Can you check now?

Hi, thanks for the quick fix,
It does work correctly now!

@pzelasko
Copy link
Copy Markdown
Collaborator Author

pzelasko commented Jun 2, 2026

/ok to test 325b315

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

[🤖]: Hi @pzelasko 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Copy link
Copy Markdown
Member

@Edresson Edresson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pzelasko pzelasko merged commit 05e8c54 into main Jun 2, 2026
166 checks passed
@pzelasko pzelasko deleted the codex/speechlm2-offline-backbone branch June 2, 2026 20:57
@swamlife
Copy link
Copy Markdown

swamlife commented Jun 2, 2026 via email

@swamlife
Copy link
Copy Markdown

swamlife commented Jun 2, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants