Mistral launches Voxtral TTS: open-weight voice AI that clones voices from three seconds
Mistral AI has released its first text-to-speech model, Voxtral TTS, as open weights. Launched on March 26, 2026, it positions itself as a direct alternative to ElevenLabs and OpenAI TTS.
Voxtral TTS is a 4B-parameter model built on Ministral 3B. It supports nine languages — English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic — and can clone a voice from just three seconds of reference audio. Latency is 70 milliseconds for a ten-second audio sample.
The model is available via API and on Hugging Face under a non-commercial license. Mistral highlights that it is light enough to run locally on a laptop, smartphone, or edge device.
For enterprises building voice assistants, customer service tools, or sales systems, this is significant: open source means full control, no third-party dependency, and the ability to run locally without API costs. Competition in the TTS market is intensifying sharply.
📬 Likte du denne?
AI-nyheter for ledere. Kuratert av en CIO som bygger det selv. Daglig i innboksen.