Morph released FlashCompact, a specialized compaction model and SDK for coding agents, claiming 33k tokens per second and near-invisible long-context compression. Use it or copy the approach if compaction latency and noisy tool output are blocking longer agent runs.

/v1/compactFlashCompact is not a general model repurposed for summarization. Morph describes it as “the first specialized model for context compaction,” aimed at shortening agent state fast enough that compaction stops being the bottleneck in long coding runs FlashCompact launch. In Morph’s technical thread, the company says it trained a dedicated compaction model instead of relying on slower generic summarization flows, then served it on “a custom PyTriton based stack on H200.”
The core claim is latency plus compression: “200k → 50k in ~1.5s” at 33,000 tokens per second launch post. That matters because Morph argues current agent compaction often means “waiting 2+ minutes for terrible results,” as its compaction complaint puts it. The company has also published a blog post and a playground for testing the approach.
Morph says it reviewed more than 200 agent sessions and over 40 coding-agent harnesses, and concluded that “most context bloat comes from tool responses, not model generation” eval summary. That is a practical implementation detail for agent builders: the waste is allegedly in logs, command output, and tool chatter, not only in the model’s own prior turns.
The company’s summary claim is that compaction caused “no performance drop” while reducing both token counts and step counts eval summary. Its linked blog writeup describes two operating modes: objective-mode compaction that strips filler without task guidance, and query-based compaction that keeps or drops details based on the agent’s next step.
There is already some evidence of stack-level adoption. An OpenClaw pull request linked by Morph adds Morph as a compaction provider through /v1/compact, using a pre-compaction hook rather than replacing the whole flow OpenClaw integration. The PR summary says the integration falls back to LLM summarization if Morph is unavailable, and adds exponential-backoff retries, but reviewers also noted a bug where non-retryable errors may still be retried and a quota-wasting edge case when a quality guard re-calls compaction on identical input PR notes.
Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Introducing FlashCompact - the first specialized model for context compaction 33k tokens/sec 200k → 50k in ~1.5s Fast, high quality compaction
So, we trained a specialized model for compaction and made it really fast - outputting at 33,000 tok/sec We built on a custom PyTriton based stack on H200, using a similar inference stack as our FastApply model
We looked at 200+ agent sessions and over 40 of the top coding agent harnesses Most context bloat comes from tool responses, not model generation. Result: → no performance drop → fewer tokens → fewer steps To push performance higher and perform long horizon tasks, agents Show more