releaseMarch 22, 2026

KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.

Local Inference Voice

2 min read

KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

TL;DR

KittenML's v0.8 repo page says KittenTTS v0.8 now spans 15M to 80M models, with the smallest int8 checkpoint coming in around 25MB for local or edge voice use.
The same project page positions the release as CPU-friendly ONNX TTS, with text preprocessing, a Python API, Hugging Face models, and a browser demo already available.
Early user reports in the HN discussion suggest the 80M model can run at about 1.5x realtime on an Intel 9700 CPU, which makes local preview and lightweight app integration plausible.
Creators should still treat this as promising rather than solved: the Show HN thread includes complaints about install bloat and pronunciation failures on numbers.

What shipped

Hacker Newspage555 points179 comments

KittenML/KittenTTS

Posted by rohan_joshi

Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX. Latest v0.8 release (Feb 2026) offers models from 15M (25MB int8) to 80M parameters (80MB), running high-quality synthesis on CPU without GPU. Features text preprocessing, Python API (pip install wheel), Hugging Face models (e.g., kitten-tts-nano-0.8), browser demo on HF Spaces. Apache-2.0 licensed, developer preview with commercial support available. Future: multilingual TTS, KittenASR.

Open linked page Open HN thread

According to the GitHub page, KittenTTS v0.8 is an open-source ONNX text-to-speech library with model sizes from 15M to 80M parameters. The smallest int8 model is listed at 25MB, while the larger 80M model is framed as high-quality synthesis that can run on CPU without a GPU. For creative tooling, the practical package is the Python API, downloadable Hugging Face checkpoints, and a browser demo linked from the same project page.

Where it looks useful — and where it may break

Hacker Newsdiscussion555 points179 comments

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Thread discussion highlights: - deathanatos on dependency bloat / torch CUDA: "It pulls in NVIDIA libs... I literally run out of disk trying to install this on Linux." - baibai008989 on edge deployment: "the dependency chain issue is a real barrier for edge deployment... 25MB is genuinely exciting for that use case." - bobokaytop on latency / realtime performance: "running on an intel 9700 CPU, it's about 1.5x realtime using the 80M model. It wasn't any faster running on a 3080 GPU though."

Discussed by

deathanatos on dependency bloat / torch CUDA
baibai008989 on edge deployment
bobokaytop on latency / realtime performance

Open HN thread Open HN thread

The strongest creative angle is local voice generation where size and runtime matter more than studio-grade polish. In the discussion roundup, one user reports about 1.5x realtime on an Intel 9700 CPU with the 80M model, while another calls a 25MB model genuinely exciting for edge deployment because dependency chains often block small-device shipping.

The same thread also shows why audio teams should test before committing. A commenter in the main HN thread says Linux installation pulled in enough NVIDIA libraries to become a disk problem, and another reports that number pronunciation degraded into noise. That makes v0.8 more compelling as an experimental local voice layer than a drop-in production narrator.

🧾 More sources

Hacker Newscore555 points179 comments

Show HN: Three new Kitten TTS models – smallest less than 25MB

Posted by rohan_joshi

Relevant for creatives working with voice and audio production: the thread is about expressive text-to-speech, voice quality, prosody, pronunciation, and whether very small models can still produce usable spoken output for apps and media workflows.

Discussed by

deathanatos on dependency bloat / torch CUDA
baibai008989 on edge deployment
bobokaytop on latency / realtime performance

Open HN thread Open HN thread

KittenTTS releases v0.8 with a 25MB int8 model and CPU-only speech synthesis

TL;DR

What shipped

KittenML/KittenTTS

Where it looks useful — and where it may break

Discussion around Show HN: Three new Kitten TTS models – smallest less than 25MB

🧾 More sources

Show HN: Three new Kitten TTS models – smallest less than 25MB

Fun-CineForge opens multi-speaker dubbing with temporal modality and a dataset pipeline

Grok launches Text-to-Speech API with expressive controls and LiveKit support

Claude Code supports local Ollama backends with qwen3-coder 30b and qwen2.5-coder 7b

Seedance 2.0 supports wildlife-documentary narration and character SFX, creators report

Read next

Topview integrates Seedance 2.0 into Agent V2 with storyboard timelines and 365-day unlimited access

Midjourney V8 updates film-still workflows with deeper compositions and ECLIPTIC remake tests

LTX Studio supports a Vice City rerender pipeline with Nano Banana 2 and 4K animation

Nano Banana 2 supports character turnarounds, realism traits, and composition-locked rerenders