KittenTTS now offers nano, micro and mini text-to-speech models, with the smallest int8 build under 25MB and built for ONNX CPU inference. Creators can run local voice tools without a cloud round trip.

Posted by rohan_joshi
Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX with models from 15M to 80M parameters (25-80 MB). It supports CPU inference without GPU, features 8 built-in voices, adjustable speed, text preprocessing, and 24 kHz output. Latest release v0.8.1 (Feb 2026) includes nano (15M/25MB int8), micro (40M), and mini (80M) models. Python pip install available, with basic API for generation. 13k+ stars, Apache 2.0 license.
KittenTTS v0.8.1 packages three model sizes: nano at 15M parameters, micro at 40M, and mini at 80M, with the nano model quantized to roughly 25MB in int8 form the project page. The library is open source, built on ONNX, installable from Python, and positioned for CPU-first use rather than a cloud API round trip project details.
For creative workflows, the concrete features are simple but useful: eight built-in voices, adjustable speed, text preprocessing, and 24 kHz output the launch thread. That makes it more relevant for local narration, character placeholders, interactive installs, and quick voice mockups than for fully directed studio voice performance.
Posted by rohan_joshi
Thread discussion highlights: - tredre3 on dependency bloat: The package pulls a chain of dependencies including spacy and, via uv, torch/CUDA packages that are several GB, which the commenter says undermines the appeal of a tiny edge model. - baibai008989 on edge deployment and latency: A Raspberry Pi/home automation use case is cited as exactly where a sub-25MB model matters, but the commenter asks about first-chunk latency and whether the system supports streaming output for interactive use. - bobokaytop on quality vs latency on low-power hardware: The commenter says the real bottleneck for edge deployments is often inference latency and audio streaming architecture, not just model size, and asks how it performs on a Raspberry Pi 4 in real time.
The early discussion is less about whether 25MB is impressive and more about what happens after install. In the thread summary, commenters say dependency chains can pull in far larger packages than the headline model size suggests, which undercuts the appeal for edge setups.
The other open questions are real-time behavior and control. Commenters ask about first-chunk latency, streaming output, Raspberry Pi performance, and whether creators get finer expressive controls such as pitch, volume, or explicit style tags latency questions expressive control.
Posted by rohan_joshi
Relevant for creator-facing voice workflows because it’s about compact speech synthesis with multiple voices, expressive quality, and whether a small local model can be good enough for production audio generation.
KittenML's latest open-source TTS release spans 15M to 80M models, with the smallest coming in under 25MB and the larger one reportedly running faster than realtime on CPU. Audio creators should test pronunciation and install overhead before betting on it for edge or local voice tools.
updateSeedance 2.0 is now showing up across CapCut Video Studio, Dreamina and Pippit with multi-scene timelines and shot templates. Creators can use it to move from single clips to editable long-form production.
releaseRunway's new web app turns a prompt or starter image into a cut scene with dialogue, sound effects and shot pacing. Creators can now block whole sequences instead of stitching isolated clips.
releasePosts report Nano Banana 2 now offers 4K image output, and creators are using it for poster systems, hidden-object layouts and character sheets. Higher-res stills should travel better into video, branding and print workflows.
updateOfficial and partner demos show Uni-1 handling localized edits, dense layouts, manga generation and Pouty Pal chibis. Creators can reuse one model across avatar, editorial and comic workflows.