Fresh stories
Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench
Mistral OCR 4 adds layout-aware extraction with bounding boxes, block typing, and inline confidence across 170 languages. Use it through the API or self-hosted deployments when document pipelines need structure, citations, redaction, and chunking.

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover
AssemblyAI’s Universal-3.5 Pro Realtime now carries forward the agent side of a conversation to improve live transcription. The release also ships multilingual realtime ASR features, and one early deployment said critical-utterance errors fell from 26% to 9%.


Anthropic launches Claude Tag in Slack beta with channel memory
Claude Tag puts Claude into Slack as a teammate that can handle threads, use approved tools, and follow up proactively in selected channels. Team and Enterprise users can try it in beta to keep shared channel context instead of restarting from private chats.

Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench
Mistral OCR 4 adds layout-aware extraction with bounding boxes, block typing, and inline confidence across 170 languages. Use it through the API or self-hosted deployments when document pipelines need structure, citations, redaction, and chunking.

Kilo Code launches Auto Efficient routing with KiloBench model selection
Kilo Code added an Auto Efficient mode that routes each request to the cheapest model that clears its benchmark bar using public KiloBench results. The router stays session-aware and falls back to stronger paid models when confidence is low.

Claude Code 2.1.187 adds sandbox.credentials and 5-minute MCP aborts
Claude Code 2.1.187 adds sandbox.credentials to block credential and secret-env access from sandboxed commands and aborts remote MCP calls after five minutes. It also adds org model restrictions and fixes structured-output retry loops.
AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover
AssemblyAI’s Universal-3.5 Pro Realtime now carries forward the agent side of a conversation to improve live transcription. The release also ships multilingual realtime ASR features, and one early deployment said critical-utterance errors fell from 26% to 9%.
Perceptron releases Files API with reusable upload IDs
Perceptron’s Files API lets developers upload an image or video once and reference it by ID across later requests instead of resending base64 or URLs. That simplifies repeated multimodal workflows and cuts transfer overhead for video-heavy pipelines.
Latitude launches MIT-licensed agent monitoring with Signals clustering and MCP access
Latitude released an open-source platform for monitoring AI agents in production, with plain-English trace search, repeated-failure clustering, and MCP access from coding agents. That gives teams a self-hostable way to inspect token burn, surface recurring failures, and turn production traces into evals and fixes.
Top storiesthis week
GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS
GLM-5.2 added Perplexity Agent API, Droid, and more hosting options, while Baseten reported over 280 TPS and sub-0.8s TTFT. Builders should watch the cost and benchmark data as it moves into production agent stacks.


Google ships Interactions API in GA as Gemini default with background agents
Google put the Interactions API into GA as the new default for Gemini, adding background execution, managed agents, remote sandboxes, and multimodal tools. Builders now get one stateful interface for models, long-running jobs, and future Gemini Omni support.

Vals AI releases SkillsBench with a 17-point coding-agent gain and MiniMax-M3 at +25.4
Vals AI launched SkillsBench, a public benchmark for measuring how reusable skills change coding-agent performance, and reported average accuracy rising from 35.5% to 52.5%. The results matter because they suggest some workflows can move to cheaper models when task-specific skills are available.

Vercel supports WebSockets in Fluid with Socket.IO and 30-minute reconnects
Vercel rolled out native WebSocket support so Node.js libraries like Socket.IO can run from CDN to Fluid. Existing sessions still reconnect at the 30-minute function limit, so teams should test long-lived connections before migrating.

Sakana Fugu launches one-API orchestration with Fable benchmark claims
Sakana AI launched Fugu and Fugu Ultra as OpenAI-compatible orchestration models that route, verify, and synthesize across multiple models. The release matters because Sakana is selling multi-agent coordination as a single endpoint, but it has not fully disclosed model mix or pass-through costs.






