Sopro TTS custom nodes for ComfyUI - Lightweight CPU-based text-to-speech with zero-shot voice cloning.
- ⚡ Runs efficiently on CPU (0.25 RTF - 30s audio in 7.5s)
- 🎤 Zero-shot voice cloning with 3-12s reference audio
- 🔧 Compatible with all ComfyUI audio workflows
- 💾 169M parameters - lightweight and fast
- Navigate to your ComfyUI custom_nodes directory:
cd ComfyUI/custom_nodes/- Clone this repository:
git clone https://github.com/ai-joe-git/ComfyUI-Sopro.git
cd ComfyUI-Sopro- Install dependencies:
pip install -r requirements.txt- Restart ComfyUI
Main text-to-speech generation node with optional voice cloning.
Inputs:
text(required): Text to synthesizereference_audio(optional): Reference audio for voice cloningspeed(optional): Speech speed (0.5-2.0, default 1.0)temperature(optional): Generation temperature (0.1-1.5, default 0.7)seed(optional): Random seed for reproducibility
Outputs:
audio: Generated audio in ComfyUI format
Load audio files for voice cloning.
Inputs:
audio_file: Audio file from input directory
Outputs:
reference_audio: Audio in ComfyUI format
Save generated audio to output directory.
Inputs:
audio: Audio to savefilename_prefix: Output filename prefixformat: Output format (wav/mp3/flac)
- Add "Sopro TTS Generator" node
- Enter your text
- (Optional) Add "Sopro Load Reference Audio" and connect for voice cloning
- Connect output to "Sopro Save Audio" or other audio processing nodes
- Generate!
- Use phonemes instead of abbreviations (e.g., "1 plus 2" not "1 + 2")
- Reference audio should be 3-12 seconds for best voice cloning
- Lower temperature for more consistent output
- Works great with other ComfyUI audio nodes!
- Sopro TTS by samuel-vitorino