breakingMarch 14, 2026

Tiiny claims pocket AI server runs local 120B models with an OpenAI-compatible API

Tiiny claims its pocket-sized local AI server can run open models up to 120B and expose an OpenAI-compatible local API without token fees. Privacy-sensitive teams should validate throughput and model quality before deploying always-on local agents.

Agent Readiness LLM Serving Cost Optimization

2 min read

Tiiny claims pocket AI server runs local 120B models with an OpenAI-compatible API

TL;DR

Tiiny is being pitched as a "phone-sized" local inference box that can run open models "up to 120B" without cloud APIs, according to the launch thread.
The company and early tester are framing it as a drop-in local backend for agents and chat apps, with the Kickstarter summary saying it exposes an OpenAI-compatible API and charges no token fees.
In the thread, the cited use cases include powering an "agent like OpenClaw 24/7," replacing a chatbot, and handling "anything that requires an API."
The available evidence is still promotional: the demo video summary describes local LLM, TTS, and text-to-image workflows, but it does not publish throughput, latency, memory limits, or quality results for specific 120B-class models.