Tiiny claims its pocket-sized local AI server can run open models up to 120B and expose an OpenAI-compatible local API without token fees. Privacy-sensitive teams should validate throughput and model quality before deploying always-on local agents.

Tiiny's core claim is a pocket-size device that acts as a personal inference server for open-source AI models. In the launch thread, Paul Couvert says it can run models "up to 120B," stay "100% local and private," and serve workloads that normally sit behind a hosted API.
The linked Kickstarter page, as summarized in the project post, adds the implementation detail engineers will care about: Tiiny is presented as an OpenAI-compatible local API endpoint with "one-click deployment" and "no token fees." That positions it less like a standalone app and more like a small edge box that could slot into existing agent or chat stacks with minimal client-side changes.
The practical demos center on replacing hosted subscriptions with local inference. According to the demo summary, the box can run a local chat interface, support coding workflows, generate landing pages, and drive browser agents for scraping, form filling, and social posting. The same summary says it can also handle text-to-speech and text-to-image models, widening the pitch beyond a single LLM endpoint.
What the evidence does not establish is the performance envelope. Neither the thread nor the demo summary specifies tokens per second, quantization, concurrent request handling, power draw, or which 120B models were actually tested. For engineering teams, that leaves Tiiny as an interesting edge-serving claim with API-compatibility appeal, but without the benchmark detail needed to compare it against a Mac Studio, a local GPU box, or a managed inference endpoint.
Miles added ROCm support for AMD Instinct clusters and reported GRPO post-training gains on Qwen3-30B-A3B, including AIME rising from 0.665 to 0.729. It matters if you are evaluating rollout-heavy RL jobs off NVIDIA and want concrete throughput and step-time numbers before porting.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
You can run local AI models up to 120B without a $10,000 Mac Studio This phone-sized device is your own server for open-source models. 100% local and private. And you can use it to: - Power an agent like OpenClaw 24/7 - Completely replace a chatbot - Literally anything that Show more
Access to Tiiny Kickstarter page → kickstarter.com/projects/tiiny… I’ve been using it for weeks now and have gone from spending hundreds of dollars on APIs to literally zero... All while no longer giving any data to third-party servers. You own your intelligence!
Also available on YT if you prefer to watch there! youtu.be/Ew41f0B28T8