Features
- OpenAI API Compatible: Drop-in replacement for OpenAI's REST API (chat, completions, embeddings, transcription)
- Fully Local & Private: No data leaves your machine
- Multi-Model Support: LLaMA, Mistral, RWKV, and more with GGUF and GPTQ models
- Hardware Flexible: CPU and GPU support across Linux, macOS, and Windows
- Embeddings & RAG Support: Generate and serve embeddings for retrieval-augmented generation
- Multimodal: Speech-to-text and text-to-speech integration; early vision model support