Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
328944e
nemo/collections/tts/models/easy_magpietts_inference.py: remove dupli…
vklimkov-nvidia Jun 2, 2026
78404cb
examples/tts/easymagpie_vllm_omni: initial commit for vllm_omni defin…
vklimkov-nvidia Jun 2, 2026
87d742c
examples/tts/easymagpie_vllm_omni: switch to actual configuration
vklimkov-nvidia Jun 2, 2026
bb8b427
examples/tts/easymagpie_vllm_omni: make sure model runs with cuda graphs
vklimkov-nvidia Jun 2, 2026
9992569
examples/tts/easymagpie_vllm_omni: extend preprocess to take speaker …
vklimkov-nvidia Jun 2, 2026
3a8d50b
examples/tts/easymagpie_vllm_omni: introduce script to convert the ch…
vklimkov-nvidia Jun 2, 2026
85be128
examples/tts/easymagpie_vllm_omni: clean up, add readme
vklimkov-nvidia Jun 2, 2026
f984ee1
examples/tts/easymagpie_vllm_omni: implement delay and proper phoneme…
vklimkov-nvidia Jun 2, 2026
9ab0038
examples/tts/easymagpie_vllm_omni: take text as input instead of tokens
vklimkov-nvidia Jun 3, 2026
36ce9a5
examples/tts/easymagpie_vllm_omni: add script to benchmark the acoust…
vklimkov-nvidia Jun 3, 2026
f5c06a5
examples/tts/easymagpie_vllm_omni/easy_magpietts_convert_to_vllm.py: …
vklimkov-nvidia Jun 3, 2026
8721e54
examples/tts/easymagpie_vllm_omni/tests: add tests to check equivalen…
vklimkov-nvidia Jun 3, 2026
4eda162
examples/tts/easymagpie_vllm_omni: hotfix for nemotron_h in fp16, nee…
vklimkov-nvidia Jun 3, 2026
4c7388f
examples/tts/easymagpie_vllm_omni: introduce EOS forwarding from LT s…
vklimkov-nvidia Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions examples/tts/easymagpie_vllm_omni/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
WIP model definition of EasyMP for vllm-omni. Follows footsteps of qwen3tts:
backbone and LT are compiled into a single cuda graph during uniform batch decoding,
piecewise during mixed/prefill.

Install:
```
pip install -e ".[all]"
pip install ninja mamba_ssm causal_conv1d --no-build-isolation
# install vllm
pip install vllm==0.21.0 vllm_omni==0.21.0rc1
# register vllm models
pip install -e examples/tts/easymagpie_vllm_omni/
```

Conver the checkpoint from
https://huggingface.co/nvidia/easymagpietts_NEXT/tree/main/2605_NemotronTTS_V0.2/v2
```
python examples/tts/easymagpie_vllm_omni/easy_magpietts_convert_to_vllm.py \
--nemo_file <ckpt>/2605_EMTTS_SmallMamba_Step150k_posttrained_epoch12.nemo \
--codec_model_path <ckpt>/25fps_spectral_codec_with_bandwidth_extension.nemo \
--outdir examples/tts/easymagpie_vllm_omni/easymp_vllm_model \
--context_audio english_sample.wav --speaker_name eng \
--phoneme_tokenizer_path <ckpt>/bpe_ipa_tokenizer_2048_en_de_es_fr_hi_it_vi_zh_ko-KR_pt-BR_ar.json
```

Finally run notebook `examples/tts/easymagpie_vllm_omni/easymagpie_inference_demo.ipynb`
to predict acoustic tokens
Loading
Loading