workflowMarch 15, 2026

oMLX supports Claude Code locally with tiered KV cache and Anthropic Messages API

oMLX now supports local Claude Code setups on Apple Silicon with tiered KV cache and an Anthropic Messages API-compatible endpoint, with one setup reporting roughly 10x faster performance than mlx_lm-style serving. If you want private on-device coding agents, point Claude Code at a local compatible endpoint and disable the attribution header to preserve cache reuse.

Claude Code KV Cache Cost Optimization Developer Experience

3 min read

oMLX supports Claude Code locally with tiered KV cache and Anthropic Messages API

TL;DR

A practitioner setup shows Claude Code can target a fully local backend on Apple Silicon by pointing it at any server that speaks the Anthropic Messages API, instead of Anthropic’s hosted endpoint local backend setup.
The reported speedup came from swapping the inference layer, not the model: according to the caching breakdown, oMLX restored prefix reuse with “tiered KV caching and continuous batching,” and the user reports roughly “~10× faster” behavior than earlier attempts speed claim thread.
Claude Code’s default attribution header can break cache consistency in this setup; the workaround in the config notes is to disable it with CLAUDE_CODE_ATTRIBUTION_HEADER=0 so repeated prompts keep hitting cache.
Model fit still depends on local hardware. In