Chatterbox

Text-to-speech (TTS) service for generating audio from text.

Features

Voice Cloning: Clone a voice from a short reference audio clip with no fine-tuning
Expressive Synthesis: Emotion and style controllable output beyond monotone TTS
High-Quality Audio: Neural model producing natural-sounding speech
Self-Hosted: Runs locally on CPU or GPU — no external API calls
Open Source: MIT-licensed model from Resemble AI