From 8651ccd564a61e8859a40af5273a9eb7a1419c0e Mon Sep 17 00:00:00 2001 From: Ssofja Date: Tue, 2 Jun 2026 20:47:49 +0400 Subject: [PATCH 1/2] adding community featured models Signed-off-by: Ssofja --- docs/source/asr/asr_checkpoints.rst | 2 + .../asr/featured_community_checkpoints.rst | 119 ++++++++++++++++++ docs/source/asr/intro.rst | 1 + docs/source/starthere/choosing_a_model.rst | 1 + 4 files changed, 123 insertions(+) create mode 100644 docs/source/asr/featured_community_checkpoints.rst diff --git a/docs/source/asr/asr_checkpoints.rst b/docs/source/asr/asr_checkpoints.rst index 0d6ce1d0478f..bd7a6cd4b2ba 100644 --- a/docs/source/asr/asr_checkpoints.rst +++ b/docs/source/asr/asr_checkpoints.rst @@ -7,6 +7,8 @@ ASR Model Checkpoints This page lists all supported ASR model checkpoints released by NVIDIA NeMo. Benchmark scores for each model can be found on its `HuggingFace model card `__. +For community fine-tunes built on these checkpoints, see :doc:`Featured Community Checkpoints <./featured_community_checkpoints>`. + Glossary -------- diff --git a/docs/source/asr/featured_community_checkpoints.rst b/docs/source/asr/featured_community_checkpoints.rst new file mode 100644 index 000000000000..881e0dae57c9 --- /dev/null +++ b/docs/source/asr/featured_community_checkpoints.rst @@ -0,0 +1,119 @@ +.. _featured-community-checkpoints: + +Featured Community Checkpoints +============================== + +Community fine-tunes built on NVIDIA NeMo ASR checkpoints and published on Hugging Face. +Depending on the repo, a checkpoint loads with **NeMo** (``.nemo``), **MLX** (Apple Silicon), or **GGUF** (C++ via `CrispASR `__). + +For NVIDIA-published checkpoints, see :doc:`./asr_checkpoints` and the `NVIDIA Hugging Face organization `__. + +.. note:: + + Community checkpoints are maintained by their authors, not by the NeMo team. + + +NeMo +---- + +Load checkpoints that ship a ``.nemo`` file on Hugging Face with ``ASRModel.from_pretrained()``: + +.. code-block:: python + + import nemo.collections.asr as nemo_asr + + model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med") + print(model.transcribe(["audio.wav"])[0].text) + +.. list-table:: + :header-rows: 1 + :widths: 26 20 14 40 + + * - Model + - Base checkpoint + - License + - Highlights + * - `akera/parakeet-tdt-salt `__ + - `parakeet-tdt-0.6b-v3 `__ + - See model card + - SALT multilingual ASR for 10 East African languages. Hybrid TDT+CTC FastConformer, 600M params. + * - `johannhartmann/parakeet_de_med `__ + - `parakeet-tdt-0.6b-v3 `__ + - CC-BY-4.0 + - German medical documentation ASR. PEFT fine-tune; WER 11.73% → 3.28% on a 122-sample medical eval set. + * - `qenneth/parakeet-tdt-0.6b-v3-finetuned-for-ATC `__ + - `parakeet-tdt-0.6b-v3 `__ + - See model card + - ATC English ASR on `jacktol/ATC-ASR-Dataset `__. Test WER 5.99%. + * - `KasuleTrevor/parakeet-0.6b-cv-sw-5hr_v9 `__ + - Parakeet 0.6B (see model card) + - CC-BY-4.0 + - Swahili ASR fine-tune on ~5 hours of Common Voice data. + + +.. _mlx-inference: + +MLX Inference +------------- + +For Apple Silicon checkpoints, use ``parakeet-mlx`` or ``mlx-audio``: + +.. code-block:: bash + + pip install parakeet-mlx + parakeet-mlx audio.wav --model NeurologyAI/neuro-parakeet-mlx + +.. list-table:: + :header-rows: 1 + :widths: 26 20 14 40 + + * - Model + - Base checkpoint + - License + - Highlights + * - `NeurologyAI/neuro-parakeet-mlx `__ + - `parakeet-tdt-0.6b-v3 `__ + - CC-BY-4.0 + - German medical/neurology ASR for Apple Silicon. WER 1.04% on the author's medical validation set. + + +.. _gguf-inference: + +GGUF Inference +-------------- + +GGUF exports run with the `CrispASR `_ C++ CLIs — no NeMo install required: + +.. code-block:: bash + + git clone -b parakeet https://github.com/CrispStrobe/CrispASR + cd CrispASR && cmake -B build -DCMAKE_BUILD_TYPE=Release + cmake --build build -j$(nproc) --target parakeet-main canary-main + + huggingface-cli download cstr/parakeet-tdt-0.6b-v3-GGUF parakeet-tdt-0.6b-v3-q4_k.gguf --local-dir . + ./build/bin/parakeet-main -m parakeet-tdt-0.6b-v3-q4_k.gguf -f audio.wav -t 8 + +.. list-table:: + :header-rows: 1 + :widths: 26 20 14 40 + + * - Model + - Base checkpoint + - License + - Highlights + * - `cstr/parakeet-tdt-0.6b-v3-GGUF `__ + - `parakeet-tdt-0.6b-v3 `__ + - CC-BY-4.0 + - Quantised Parakeet TDT (Q4_K ~467 MB). 25 EU languages, word-level timestamps. Run with ``parakeet-main``. + * - `cstr/canary-1b-v2-GGUF `__ + - `canary-1b-v2 `__ + - CC-BY-4.0 + - Quantised Canary 1B (Q4_K ~673 MB). Multilingual ASR and speech translation. Run with ``canary-main``. + + +.. _submit-a-community-checkpoint: + +Submit a Community Checkpoint +----------------------------- + +To suggest a checkpoint for this page, open a `GitHub issue `__ with the Hugging Face model link, NeMo base checkpoint, task, languages, and evaluation results. diff --git a/docs/source/asr/intro.rst b/docs/source/asr/intro.rst index c43fee6da7c6..0f5140662a8d 100644 --- a/docs/source/asr/intro.rst +++ b/docs/source/asr/intro.rst @@ -72,3 +72,4 @@ Further Reading asr_language_modeling_and_customization configs api + featured_community_checkpoints diff --git a/docs/source/starthere/choosing_a_model.rst b/docs/source/starthere/choosing_a_model.rst index 1abcc74d6fb4..c1c1a0c2fb05 100644 --- a/docs/source/starthere/choosing_a_model.rst +++ b/docs/source/starthere/choosing_a_model.rst @@ -132,6 +132,7 @@ All pretrained NeMo models are available on: - `HuggingFace Hub (nvidia) `_ — search for "nemo" or specific model names - `NGC Model Catalog `_ — NVIDIA's model registry +- :doc:`Featured Community Checkpoints ` — fine-tunes from external users See :doc:`../checkpoints/intro` for instructions on loading pretrained models. From d8dc48648d51ca5ed528326c0f62f8c060d6b0d7 Mon Sep 17 00:00:00 2001 From: Ssofja Date: Thu, 4 Jun 2026 14:04:11 +0400 Subject: [PATCH 2/2] Change the structure of the page based on comment Signed-off-by: Ssofja --- .../asr/featured_community_checkpoints.rst | 104 +++--------------- 1 file changed, 17 insertions(+), 87 deletions(-) diff --git a/docs/source/asr/featured_community_checkpoints.rst b/docs/source/asr/featured_community_checkpoints.rst index 881e0dae57c9..7c0eb5ad1737 100644 --- a/docs/source/asr/featured_community_checkpoints.rst +++ b/docs/source/asr/featured_community_checkpoints.rst @@ -4,111 +4,41 @@ Featured Community Checkpoints ============================== Community fine-tunes built on NVIDIA NeMo ASR checkpoints and published on Hugging Face. -Depending on the repo, a checkpoint loads with **NeMo** (``.nemo``), **MLX** (Apple Silicon), or **GGUF** (C++ via `CrispASR `__). - For NVIDIA-published checkpoints, see :doc:`./asr_checkpoints` and the `NVIDIA Hugging Face organization `__. .. note:: Community checkpoints are maintained by their authors, not by the NeMo team. - - -NeMo ----- - -Load checkpoints that ship a ``.nemo`` file on Hugging Face with ``ASRModel.from_pretrained()``: - -.. code-block:: python - - import nemo.collections.asr as nemo_asr - - model = nemo_asr.models.ASRModel.from_pretrained("johannhartmann/parakeet_de_med") - print(model.transcribe(["audio.wav"])[0].text) + Use each model's Hugging Face model card and the framework project linked below for up-to-date setup and inference instructions. .. list-table:: :header-rows: 1 - :widths: 26 20 14 40 + :widths: 28 52 20 - * - Model - - Base checkpoint - - License - - Highlights + * - Checkpoint + - What's special + - Framework * - `akera/parakeet-tdt-salt `__ - - `parakeet-tdt-0.6b-v3 `__ - - See model card - - SALT multilingual ASR for 10 East African languages. Hybrid TDT+CTC FastConformer, 600M params. + - SALT multilingual ASR for 10 East African languages. Hybrid TDT+CTC FastConformer (600M), fine-tuned from `parakeet-tdt-0.6b-v3 `__. + - NeMo * - `johannhartmann/parakeet_de_med `__ - - `parakeet-tdt-0.6b-v3 `__ - - CC-BY-4.0 - - German medical documentation ASR. PEFT fine-tune; WER 11.73% → 3.28% on a 122-sample medical eval set. + - German medical documentation ASR (PEFT). WER 11.73% → 3.28% on a 122-sample medical eval set. + - NeMo * - `qenneth/parakeet-tdt-0.6b-v3-finetuned-for-ATC `__ - - `parakeet-tdt-0.6b-v3 `__ - - See model card - ATC English ASR on `jacktol/ATC-ASR-Dataset `__. Test WER 5.99%. + - NeMo * - `KasuleTrevor/parakeet-0.6b-cv-sw-5hr_v9 `__ - - Parakeet 0.6B (see model card) - - CC-BY-4.0 - Swahili ASR fine-tune on ~5 hours of Common Voice data. - - -.. _mlx-inference: - -MLX Inference -------------- - -For Apple Silicon checkpoints, use ``parakeet-mlx`` or ``mlx-audio``: - -.. code-block:: bash - - pip install parakeet-mlx - parakeet-mlx audio.wav --model NeurologyAI/neuro-parakeet-mlx - -.. list-table:: - :header-rows: 1 - :widths: 26 20 14 40 - - * - Model - - Base checkpoint - - License - - Highlights + - NeMo * - `NeurologyAI/neuro-parakeet-mlx `__ - - `parakeet-tdt-0.6b-v3 `__ - - CC-BY-4.0 - German medical/neurology ASR for Apple Silicon. WER 1.04% on the author's medical validation set. - - -.. _gguf-inference: - -GGUF Inference --------------- - -GGUF exports run with the `CrispASR `_ C++ CLIs — no NeMo install required: - -.. code-block:: bash - - git clone -b parakeet https://github.com/CrispStrobe/CrispASR - cd CrispASR && cmake -B build -DCMAKE_BUILD_TYPE=Release - cmake --build build -j$(nproc) --target parakeet-main canary-main - - huggingface-cli download cstr/parakeet-tdt-0.6b-v3-GGUF parakeet-tdt-0.6b-v3-q4_k.gguf --local-dir . - ./build/bin/parakeet-main -m parakeet-tdt-0.6b-v3-q4_k.gguf -f audio.wav -t 8 - -.. list-table:: - :header-rows: 1 - :widths: 26 20 14 40 - - * - Model - - Base checkpoint - - License - - Highlights + - MLX * - `cstr/parakeet-tdt-0.6b-v3-GGUF `__ - - `parakeet-tdt-0.6b-v3 `__ - - CC-BY-4.0 - - Quantised Parakeet TDT (Q4_K ~467 MB). 25 EU languages, word-level timestamps. Run with ``parakeet-main``. + - Quantised Parakeet TDT (Q4_K ~467 MB). 25 EU languages, word-level timestamps. + - GGUF (`CrispASR `__) * - `cstr/canary-1b-v2-GGUF `__ - - `canary-1b-v2 `__ - - CC-BY-4.0 - - Quantised Canary 1B (Q4_K ~673 MB). Multilingual ASR and speech translation. Run with ``canary-main``. + - Quantised Canary 1B (Q4_K ~673 MB). Multilingual ASR and speech translation. + - GGUF (`CrispASR `__) .. _submit-a-community-checkpoint: @@ -116,4 +46,4 @@ GGUF exports run with the `CrispASR `_ Submit a Community Checkpoint ----------------------------- -To suggest a checkpoint for this page, open a `GitHub issue `__ with the Hugging Face model link, NeMo base checkpoint, task, languages, and evaluation results. +To suggest a checkpoint for this page, open a `GitHub issue `__ with the Hugging Face model link, NeMo base checkpoint, task, languages, evaluation results, and inference framework.