releaseMarch 26, 2026

Mistral launches Voxtral TTS with 9 languages and 90 ms first audio

Mistral released open-weight Voxtral TTS with low-latency streaming, voice cloning, and cross-lingual adaptation, and vLLM Omni shipped day-0 support. Voice-agent teams should compare quality, latency, and serving cost against closed APIs.

Mistral Voice Agents Realtime AI Benchmarks

4 min read

Mistral launches Voxtral TTS with 9 languages and 90 ms first audio

TL;DR

Mistral has launched Voxtral TTS, an open-weight text-to-speech model for “natural, expressive” speech with support for 9 languages, low-latency streaming, and adaptation to new voices; the launch thread says it is available in Le Chat, Mistral Studio, and as downloadable weights.
The implementation story is immediate: vLLM Omni says Voxtral-4B-TTS has day-0 support with vllm serve ... --omni, plus 24 kHz output and common audio formats including WAV, MP3, FLAC, AAC, and Opus.
Mistral and early coverage are positioning quality as the wedge. In a benchmark summary, Voxtral is reported to beat ElevenLabs Flash v2.5 in Mistral-run listener preference tests, with roughly 63% wins on standard voices and about 70% on voice customization.
The practical differentiator for voice-agent teams is the open-weight package: