Fresh stories

Codex users report /goal, /rewind, and /compact workflows after launch
A day after /goal and thread automations landed in Codex, practitioners started standardizing on /goal specs, /fork or /side detours, and /rewind plus /compact recovery. The pattern matters because verifier design and compaction timing now control how well long runs hold together.
Microsoft opens SkillOpt with batch eval loops for agent SOP files
Microsoft open-sourced SkillOpt, a system that treats agent skill documents as tunable artifacts and improves them against measured task batches. It matters because practitioners are already standardizing shared /research, QA, and packageable skills across harnesses, turning skill files into a new optimization surface alongside models.


Codex users report /goal, /rewind, and /compact workflows after launch
A day after /goal and thread automations landed in Codex, practitioners started standardizing on /goal specs, /fork or /side detours, and /rewind plus /compact recovery. The pattern matters because verifier design and compaction timing now control how well long runs hold together.

DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4
DeepSeek extended DSpark beyond V4 by publishing draft-model checkpoints for Qwen3 and Gemma-4 families and clarifying that DSpark targets higher-throughput serving by controlling verification cost. The release matters because speculative decoding is moving from papers into reusable open checkpoints.

Codex resets all usage limits as OpenAI investigates weekend drain reports
Two days after OpenAI said it had fixed Codex quota drain tied to fraud overflagging, the team opened a Sunday war room for fresh drain reports and issued a hard reset of user limits. The incident matters because background usage and reset rules were still opaque during long-running agent work.

xAI tests Grok 4.5 private beta on a 1.5T V9 model with Cursor data
Multiple trackers said Grok 4.5 is in private beta at SpaceX and Tesla, built on a 1.5T V9 base with supplemental Cursor data and compared internally against an unspecified Opus model. The claims matter because xAI is signaling a faster release cadence, but the reported performance is still unverified.
Microsoft opens SkillOpt with batch eval loops for agent SOP files
Microsoft open-sourced SkillOpt, a system that treats agent skill documents as tunable artifacts and improves them against measured task batches. It matters because practitioners are already standardizing shared /research, QA, and packageable skills across harnesses, turning skill files into a new optimization surface alongside models.
Plannotator v0.21.3 adds file-scoped review comments and Codex app-server support
Plannotator v0.21.3 shipped file-scoped comments, a unified review UX, default per-file Ask AI chats, and a more reliable Codex app-server path. It matters because guided reviews and plan checks can now plug into agent workflows with less custom glue.
Google limits Meta's Gemini use after capacity shortages
The FT reported that Google capped Meta's Gemini usage after Meta asked for more model capacity than Google could supply, affecting internal safety, support, ad, and coding projects. The restriction matters because model access is now constrained by chip, memory, and networking capacity as much as by API contracts.
Top storiesthis week
DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout
Independent measurements after DSpark put DeepSeek V4-Pro around 90 tok/s and cut one run from 214s to 116s. The gain matters because it lowers serving cost, though tuning details and memory overhead are still unclear.


Codex supports thread automations with /goal, /btw, and heartbeat wake-ups
Codex users documented thread automations as recurring wake-up calls that preserve thread context, alongside /goal and /btw patterns for steering long-running loops. The workflow matters because teams can schedule check-ins, queue instructions mid-run, and add adversarial review passes without building a separate orchestrator.

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic
OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

OpenCode v2 introduces one backend for TUI, desktop, and web sessions
OpenCode v2 moves its TUI, desktop, and web clients onto a shared backend so sessions stay synced and resource use drops across windows. The beta matters for multi-window agent workflows, though the next build still lacks features.

Codex adds hover navigation rail and longer thread history in desktop update
OpenAI shipped another Codex desktop update with smoother long-thread scrolling, deeper local history, better settings search, and a hover navigation rail. The release matters because long-running sessions keep your place and copy richer Markdown into Slack.






