Skip to main content

Chatterbox

Text-to-speech (TTS) service for generating audio from text.

Features

  • Voice Cloning: Clone a voice from a short reference audio clip with no fine-tuning
  • Expressive Synthesis: Emotion and style controllable output beyond monotone TTS
  • High-Quality Audio: Neural model producing natural-sounding speech
  • Self-Hosted: Runs locally on CPU or GPU — no external API calls
  • Open Source: MIT-licensed model from Resemble AI