Fresh stories

Opus 4.8 users report write failures, sycophancy, and 58% DeepSWE
Two days after launch, users and benchmarks pointed to write failures, sycophancy, lower security recall, and a 58% DeepSWE result. GPT-5.5 still leads on cost, output tokens, and pass@1 in shared coding-agent tests, so compare both before switching.
OpenRouter launches Guardrails with budget caps, ZDR, and prompt-injection filters
OpenRouter released Guardrails to apply budget limits, provider restrictions, zero-data-retention rules, prompt-injection defense, and DLP checks across routed traffic. Google Model Armor and Lakera Guard connectors are in beta, so plan around limited availability.

OpenClaw releases 2026.5.28 with Opus 4.8 support and faster turns
OpenClaw 2026.5.28 added Claude Opus 4.8 and Krea support while cutting fresh-install size 52.8% and speeding both cold and warm turns. It also expanded /subagents inspection, which should make delegated runs easier to debug.


Codex community ships /dynamic swarms, session lifecycles, and model routing
Builders added /dynamic orchestration, custom-model routing, and repo runbooks around Codex as users exposed new session lifecycle controls in the app. That makes Codex a better fit for long-running, multi-context coding work.

Opus 4.8 users report write failures, sycophancy, and 58% DeepSWE
Two days after launch, users and benchmarks pointed to write failures, sycophancy, lower security recall, and a 58% DeepSWE result. GPT-5.5 still leads on cost, output tokens, and pass@1 in shared coding-agent tests, so compare both before switching.

Hermes ecosystem ships Web UI, Control Room, and 14% lower read_file tokens
Builders released a chat-first Web UI and a multi-agent Control Room template around Hermes Agent, while core updates cut read_file input tokens by 14% and fixed TUI startup hangs. Use the new controls to manage local multi-agent setups while reducing routine token burn.

Step 3.7 Flash opens 30-day free access for Hermes users via Nous Portal
A day after launch, Nous made Step 3.7 Flash free for 30 days to Hermes users through Nous Portal. The access window landed alongside fresh vLLM/NIM and MLX-VLM support, making the model easier to test in both local and production stacks.
OpenRouter launches Guardrails with budget caps, ZDR, and prompt-injection filters
Pi ecosystem adds /goal tasks, acceptance gates, and Lovely Dev Tools
Prime Intellect launches Hosted Evaluations with harnesses, sandboxes, and rollouts viewer
OpenClaw releases 2026.5.28 with Opus 4.8 support and faster turns
Top storiesthis week
OpenAI Codex adds Windows computer use and ChatGPT mobile remote control
OpenAI added computer use to Codex on Windows and lets ChatGPT mobile steer tasks running on Windows PCs. The update extends Codex to existing Windows dev machines and adds remote review and debugging from mobile.


Opus 4.8 users report false greens, token burn, and mixed benchmark gains
A day after launch, users and third-party evals reported false verified claims, million-token loops, and mixed task results despite strong headline wins. Watch task-by-task results and token cost closely because reliability varied sharply by effort setting and harness.

Step 3.7 Flash launches with day-one support in Kilo, Modal, SGLang, Hermes, and DesignArena
Step 3.7 Flash landed immediately across Kilo, Modal, SGLang, Hermes-linked tooling, and DesignArena as the model’s 198B MoE, 256K-context release spread through the stack. The breadth of day-one support gives engineers multiple ways to serve, benchmark, and wire the new open-weight multimodal model into agents.

Cursor adds auto-review mode with classifier subagent and fewer approval prompts
Cursor shipped auto-review mode, letting agents run more tool calls with fewer approval prompts and sending unsafe or unsandboxed actions to a classifier subagent. The change lowers review friction while keeping a separate path for higher-risk calls.

Vercel Sandbox adds Docker support with persistent images and isolated container runs
Vercel Sandbox can now build and run Docker containers, persist images and installs across sessions, and host databases or full apps inside the sandbox. That broadens what coding agents and preview environments can validate without leaving Vercel.








