releaseMarch 17, 2026

Morph launches FlashCompact: 33k tok/s compaction from 200k to 50k in 1.5s

Morph released FlashCompact, a specialized compaction model and SDK for coding agents, claiming 33k tokens per second and near-invisible long-context compression. Use it or copy the approach if compaction latency and noisy tool output are blocking longer agent runs.

Coding Agents Context Engineering Inference Optimization Developer Experience

3 min read

Morph launches FlashCompact: 33k tok/s compaction from 200k to 50k in 1.5s

TL;DR

Morph launched FlashCompact, a specialized context-compaction model for coding agents, and says it runs at “33k tokens/sec,” shrinking 200k tokens to 50k in about 1.5 seconds according to the launch post.
In Morph’s technical thread, the company says the model runs on a custom PyTriton stack on H200 GPUs and targets a pain point where compaction in agents can take “2+ minutes” with poor results, as shown in the companion post.
Morph says its analysis of “200+ agent sessions” and “40 of the top coding agent harnesses” found that most context bloat comes from tool responses rather than model generation, and that compaction delivered “no performance drop,” plus fewer tokens and fewer steps in the eval summary.
Early ecosystem integration is already starting: an