releaseMarch 22, 2026

KittenTTS releases 15M-to-80M ONNX voice models for CPU deployment

KittenTTS released nano, micro, and mini ONNX TTS models sized for CPU-first deployment instead of GPU-heavy stacks. Voice-agent builders should benchmark both dependency weight and real-time latency before treating tiny size as enough.

Voice Agents Realtime AI Developer Experience

3 min read

KittenTTS releases 15M-to-80M ONNX voice models for CPU deployment

TL;DR

KittenTTS released a v0.8 ONNX-based text-to-speech stack with three models — nano, micro, and mini — spanning 15M to 80M parameters and roughly 25MB to 80MB, with the project positioned around CPU inference rather than a GPU-first serving path, according to the repo summary.
The engineering pitch is deployability: the HN writeup frames these as tiny models for edge hardware and offline inference, while the package is still labeled a "developer preview" with a basic Python API.
Early practitioner feedback says model size is only part of the story: in the discussion recap, one commenter calls dependency chains that pull in "torch + cuda" a "non-starter" for edge installs, even when the core model is small.
Reported performance is promising but not settled. The thread discussion cites one test on an Intel 9700 CPU at about "1.5x realtime" for the 80M model, with the same commenter saying it was not faster on a 3080 GPU.