Whisper
- Source: https://github.com/openai/whisper
- License: MIT
- Alternatives: Parakeet, Vosk, faster-whisper, Wav2Vec2
Speech-to-text transcription service.
Features
- High-Accuracy Transcription: Handles accents, background noise, and technical vocabulary well
- 99+ Languages: Automatic language detection with multilingual transcription
- Timestamps: Word- and segment-level timestamps for alignment and captioning
- Multiple Model Sizes: Tiny to large variants trade off speed vs. accuracy
- REST API: Various server wrappers (faster-whisper, whisper.cpp) expose an OpenAI-compatible endpoint
References
- Whisper Diarization — speaker identification (who said what) — TBD
- NVIDIA NeMo — alternative speech AI framework