KittenTTS released nano, micro, and mini ONNX TTS models sized for CPU-first deployment instead of GPU-heavy stacks. Voice-agent builders should benchmark both dependency weight and real-time latency before treating tiny size as enough.

Posted by rohan_joshi
Kitten TTS is an open-source, lightweight text-to-speech library built on ONNX with models from 15M to 80M parameters (25-80 MB), enabling high-quality CPU-based voice synthesis without GPU. Latest v0.8 release includes nano (15M/25-56MB), micro (40M/41MB), and mini (80M/80MB) models on Hugging Face. Features text preprocessing, basic Python API (pip install from GitHub release), demo on HF Spaces, and commercial support. Apache 2.0 licensed, developer preview.
KittenTTS is shipping as an Apache 2.0 open-source library built on ONNX, with three published model sizes in the current v0.8 release: nano at 15M parameters, micro at 40M, and mini at 80M the repo summary. The project description says those models land in a roughly 25MB-to-80MB footprint range and are meant for "CPU-based voice synthesis without GPU," which puts them closer to embedded or local-agent deployments than to conventional GPU-backed speech stacks GitHub repo.
The release also includes text preprocessing, a basic Python API, Hugging Face-hosted models, and a demo surface, but the repo labels the package a developer preview rather than a finished production runtime the repo summary. That matters because the main novelty here is not just another TTS checkpoint; it is a small-footprint ONNX packaging choice aimed at teams that need voice output where GPU access is expensive, unavailable, or operationally awkward.
Posted by rohan_joshi
The main engineering angle is deployability: tiny ONNX-based TTS models that can run CPU-only on edge hardware, but with real-world concerns around Python dependency size, Torch/CUDA leakage, latency, streaming, and API ergonomics. The thread is useful if you build voice agents or offline inference stacks.
The Hacker News thread immediately focused on the real bottleneck for voice agents: deployment ergonomics rather than raw model weights. In the HN summary, the core concerns were Python dependency size, Torch/CUDA leakage, latency, streaming support, and API shape — the parts that decide whether a small model actually stays small inside a shipping application.
Posted by rohan_joshi
Thread discussion highlights: - dawdler-purge on dependency bloat and CPU-only installs: the dependency chain issue is a real barrier for edge deployment... anything that pulls torch + cuda makes the whole thing a non-starter. - baibai008989 on edge deployment: the dependency chain issue is a real barrier for edge deployment... 25MB is genuinely exciting for that use case. - bobokaytop on latency and performance: Running on an intel 9700 CPU, it's about 1.5x realtime using the 80M model. It wasn't any faster running on a 3080 GPU though.
The most concrete practitioner quote in the discussion recap says "anything that pulls torch + cuda makes the whole thing a non-starter," while another commenter said 25MB is "genuinely exciting" for edge use. That split captures the practical test for this release: a tiny ONNX checkpoint only changes deployment economics if the surrounding install and runtime stay equally lean.
Performance data is still anecdotal. The same discussion recap cites one report of the 80M model running at about 1.5x realtime on an Intel 9700 CPU and "wasn't any faster" on a 3080 GPU. For engineers building offline assistants or embedded voice agents, that makes KittenTTS interesting less as a benchmark winner than as a CPU-first packaging experiment with enough early signal to justify local latency and dependency testing.
Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.