Support offline SpeechLM2 HF exports#15736
Conversation
|
/ok to test f33caf8 |
f33caf8 to
8be1335
Compare
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
8be1335 to
a8d8276
Compare
|
/ok to test a8d8276 |
|
[🤖]: Hi @pzelasko 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
|
Hi @pzelasko, when trying to infer using the exported checkpoint. It happens because I have in the exp_config.yaml : And this dtype is carried over in the HF exported checkpoint config.json : Even though it prints |
|
@AudranBert Can you confirm the current PR works as you expect it to, or do we need to change anything? |
Hi @pzelasko, |
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
|
Thanks, now I understand the issue better. I pushed a compatibility fix. Can you check now? |
|
/ok to test 6c040ec |
Hi, thanks for the quick fix, |
|
/ok to test 325b315 |
|
[🤖]: Hi @pzelasko 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
|
This is great news. Very very happy about this.
…On Fri, May 29, 2026 at 10:50 AM Piotr Żelasko ***@***.***> wrote:
What changed
This is an alternative implementation for the offline-loading goal in
#15695 <#15695>.
It keeps the root checkpoint config.json as the NeMo/SpeechLM wrapper
config and saves the original LLM HF config under llm_backbone/config.json
to avoid a config.json filename conflict. The tokenizer remains saved
once at the checkpoint root for vLLM compatibility.
At load time, HFHubMixin detects the local export layout and redirects:
- root tokenizer files -> cfg["tokenizer_path"]
- llm_backbone/config.json -> cfg["pretrained_llm"] or
cfg["pretrained_lm_name"]
This keeps old checkpoints backward compatible because configs without
these local files are left unchanged.
Compatibility coverage
- SALM
- SALMAutomodel
- SALMWithAsrDecoder
- Duplex S2S
- Duplex S2S speech decoder
- Duplex STT
- Duplex EAR-TTS
Testing
- python -m py_compile examples/speechlm2/to_hf.py
nemo/collections/speechlm2/parts/hf_hub.py
tests/collections/speechlm2/test_hf_hub.py
tests/collections/speechlm2/test_to_hf.py
- python -m black --check examples/speechlm2/to_hf.py
nemo/collections/speechlm2/parts/hf_hub.py
tests/collections/speechlm2/test_to_hf.py
tests/collections/speechlm2/test_hf_hub.py
- python -m isort --check-only examples/speechlm2/to_hf.py
nemo/collections/speechlm2/parts/hf_hub.py
tests/collections/speechlm2/test_to_hf.py
tests/collections/speechlm2/test_hf_hub.py
- pytest tests/collections/speechlm2/test_to_hf.py
tests/collections/speechlm2/test_hf_hub.py -q
ruff was not available in the local environment used for preparation.
------------------------------
You can view, comment on, or merge this pull request online at:
#15736
Commit Summary
- f33caf8
<f33caf8>
Support offline SpeechLM2 HF exports
File Changes
(11 files <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files>)
- *M* examples/speechlm2/to_hf.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-94882378465e78570027610a1a986341f514310bd04ec13cf25534d12cc7e175>
(20)
- *M* nemo/collections/speechlm2/models/duplex_ear_tts.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f703395f9b626306138b2efd16cc6cec29f1f09c7edccbaf23564298abfbd660>
(3)
- *M* nemo/collections/speechlm2/models/duplex_s2s_model.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-7d78ee075191366776b50438aefac0d86643eaf3bc2216c99ca72d3712615fbc>
(3)
- *M*
nemo/collections/speechlm2/models/duplex_s2s_speech_decoder_model.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-d624b0a0ccbcb88b2cf24c872d4bc8ec8bbffb2f6136777dee94f76c6ec54052>
(3)
- *M* nemo/collections/speechlm2/models/duplex_stt_model.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f394ae5c2f64ac904aec9bec0c4614a468765ddb41780ecd8c0071a0c3c2b301>
(3)
- *M* nemo/collections/speechlm2/models/salm.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-b7d1adcccd77921de8e8754147f388c8172675e0c353d2d08c87eed1a2108ad6>
(3)
- *M* nemo/collections/speechlm2/models/salm_asr_decoder.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-60dbdac2b06d0a3ecaf69e08be0e8139fd01104b53302cf8de55cab27c4d8b97>
(3)
- *M* nemo/collections/speechlm2/models/salm_automodel.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-4fbe3df86f69a99c6c778f8869e53cd23211d48bc30bbb8b23fe9ff3dbb539a8>
(3)
- *M* nemo/collections/speechlm2/parts/hf_hub.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-eef3597a51882b3556f35490caee3c1d5a4cc0b8711d81a8abdc5f488bb18edd>
(27)
- *A* tests/collections/speechlm2/test_hf_hub.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-fc5b99966d7b9773f13221478dc29ec3427c279fe696b1c6ebc352035ab521f4>
(75)
- *M* tests/collections/speechlm2/test_to_hf.py
<https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-e9c36b11e45a25c7819a121fb127424f33a7c2639f9d086304d2bcde861b8c0a>
(47)
Patch Links:
- https://github.com/NVIDIA-NeMo/NeMo/pull/15736.patch
- https://github.com/NVIDIA-NeMo/NeMo/pull/15736.diff
—
Reply to this email directly, view it on GitHub
<#15736?email_source=notifications&email_token=BRUUKTDMZ2ATGFUZZOS5TIT45GWULA5CNFSNUABEM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UF4ZTONZQGAYTCNJTGGTHEZLBONXW5KTTOVRHGY3SNFRGKZFFMV3GK3TUVRTG633UMVZF6Y3MNFRWW>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BRUUKTAZHR5RAQLCIWGOK3T45GWULAVCNFSM6AAAAACZSU27L6VHI2DSMVQWIX3LMV43ASLTON2WKOZUGU2DSNZVGE4DAMY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
|
😜
…On Tue, Jun 2, 2026 at 5:44 PM JAM I AM ***@***.***> wrote:
This is great news. Very very happy about this.
On Fri, May 29, 2026 at 10:50 AM Piotr Żelasko ***@***.***>
wrote:
> What changed
>
> This is an alternative implementation for the offline-loading goal in
> #15695 <#15695>.
>
> It keeps the root checkpoint config.json as the NeMo/SpeechLM wrapper
> config and saves the original LLM HF config under
> llm_backbone/config.json to avoid a config.json filename conflict. The
> tokenizer remains saved once at the checkpoint root for vLLM compatibility.
>
> At load time, HFHubMixin detects the local export layout and redirects:
>
> - root tokenizer files -> cfg["tokenizer_path"]
> - llm_backbone/config.json -> cfg["pretrained_llm"] or
> cfg["pretrained_lm_name"]
>
> This keeps old checkpoints backward compatible because configs without
> these local files are left unchanged.
> Compatibility coverage
>
> - SALM
> - SALMAutomodel
> - SALMWithAsrDecoder
> - Duplex S2S
> - Duplex S2S speech decoder
> - Duplex STT
> - Duplex EAR-TTS
>
> Testing
>
> - python -m py_compile examples/speechlm2/to_hf.py
> nemo/collections/speechlm2/parts/hf_hub.py
> tests/collections/speechlm2/test_hf_hub.py
> tests/collections/speechlm2/test_to_hf.py
> - python -m black --check examples/speechlm2/to_hf.py
> nemo/collections/speechlm2/parts/hf_hub.py
> tests/collections/speechlm2/test_to_hf.py
> tests/collections/speechlm2/test_hf_hub.py
> - python -m isort --check-only examples/speechlm2/to_hf.py
> nemo/collections/speechlm2/parts/hf_hub.py
> tests/collections/speechlm2/test_to_hf.py
> tests/collections/speechlm2/test_hf_hub.py
> - pytest tests/collections/speechlm2/test_to_hf.py
> tests/collections/speechlm2/test_hf_hub.py -q
>
> ruff was not available in the local environment used for preparation.
> ------------------------------
> You can view, comment on, or merge this pull request online at:
>
> #15736
> Commit Summary
>
> - f33caf8
> <f33caf8>
> Support offline SpeechLM2 HF exports
>
> File Changes
>
> (11 files <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files>)
>
> - *M* examples/speechlm2/to_hf.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-94882378465e78570027610a1a986341f514310bd04ec13cf25534d12cc7e175>
> (20)
> - *M* nemo/collections/speechlm2/models/duplex_ear_tts.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f703395f9b626306138b2efd16cc6cec29f1f09c7edccbaf23564298abfbd660>
> (3)
> - *M* nemo/collections/speechlm2/models/duplex_s2s_model.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-7d78ee075191366776b50438aefac0d86643eaf3bc2216c99ca72d3712615fbc>
> (3)
> - *M*
> nemo/collections/speechlm2/models/duplex_s2s_speech_decoder_model.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-d624b0a0ccbcb88b2cf24c872d4bc8ec8bbffb2f6136777dee94f76c6ec54052>
> (3)
> - *M* nemo/collections/speechlm2/models/duplex_stt_model.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-f394ae5c2f64ac904aec9bec0c4614a468765ddb41780ecd8c0071a0c3c2b301>
> (3)
> - *M* nemo/collections/speechlm2/models/salm.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-b7d1adcccd77921de8e8754147f388c8172675e0c353d2d08c87eed1a2108ad6>
> (3)
> - *M* nemo/collections/speechlm2/models/salm_asr_decoder.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-60dbdac2b06d0a3ecaf69e08be0e8139fd01104b53302cf8de55cab27c4d8b97>
> (3)
> - *M* nemo/collections/speechlm2/models/salm_automodel.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-4fbe3df86f69a99c6c778f8869e53cd23211d48bc30bbb8b23fe9ff3dbb539a8>
> (3)
> - *M* nemo/collections/speechlm2/parts/hf_hub.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-eef3597a51882b3556f35490caee3c1d5a4cc0b8711d81a8abdc5f488bb18edd>
> (27)
> - *A* tests/collections/speechlm2/test_hf_hub.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-fc5b99966d7b9773f13221478dc29ec3427c279fe696b1c6ebc352035ab521f4>
> (75)
> - *M* tests/collections/speechlm2/test_to_hf.py
> <https://github.com/NVIDIA-NeMo/NeMo/pull/15736/files#diff-e9c36b11e45a25c7819a121fb127424f33a7c2639f9d086304d2bcde861b8c0a>
> (47)
>
> Patch Links:
>
> - https://github.com/NVIDIA-NeMo/NeMo/pull/15736.patch
> - https://github.com/NVIDIA-NeMo/NeMo/pull/15736.diff
>
> —
> Reply to this email directly, view it on GitHub
> <#15736?email_source=notifications&email_token=BRUUKTDMZ2ATGFUZZOS5TIT45GWULA5CNFSNUABEM5UWIORPF5TWS5BNNB2WEL2QOVWGYUTFOF2WK43UF4ZTONZQGAYTCNJTGGTHEZLBONXW5KTTOVRHGY3SNFRGKZFFMV3GK3TUVRTG633UMVZF6Y3MNFRWW>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/BRUUKTAZHR5RAQLCIWGOK3T45GWULAVCNFSM6AAAAACZSU27L6VHI2DSMVQWIX3LMV43ASLTON2WKOZUGU2DSNZVGE4DAMY>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
What changed
This is an alternative implementation for the offline-loading goal in #15695.
It keeps the root checkpoint
config.jsonas the NeMo/SpeechLM wrapper config and saves the original LLM HF config underllm_backbone/config.jsonto avoid aconfig.jsonfilename conflict. The tokenizer remains saved once at the checkpoint root for vLLM compatibility.At load time,
HFHubMixindetects the local export layout and redirects:cfg["tokenizer_path"]llm_backbone/config.json->cfg["pretrained_llm"]orcfg["pretrained_lm_name"]This keeps old checkpoints backward compatible because configs without these local files are left unchanged.
Compatibility coverage
Testing
python -m py_compile examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_hf_hub.py tests/collections/speechlm2/test_to_hf.pypython -m black --check examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.pypython -m isort --check-only examples/speechlm2/to_hf.py nemo/collections/speechlm2/parts/hf_hub.py tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.pypytest tests/collections/speechlm2/test_to_hf.py tests/collections/speechlm2/test_hf_hub.py -q