Fresh stories
Trajectory launches continual-learning platform with off-policy SDPO
Trajectory launched a platform that turns agent traces and user corrections into post-deployment model updates instead of prompt-only fixes. Baseten and Tinker described live A/B post-training, 397B-model deployment work, and an off-policy recipe for stabilizing the loop.

Ramp reports business AI token spend at 13x January 2025 levels
Ramp data and operator reports said enterprise AI token spending is rising far faster than budget controls and procurement cycles. Teams should plan for routing, cheaper defaults, and spend caps to become core engineering infrastructure.

Hyperbrowser launches AgentRank to test Claude, GPT, and Gemini on real websites
Hyperbrowser launched AgentRank, an open-source tool that runs Claude, GPT, and Gemini agents against a site to show where they get stuck. It matters because teams can turn agent website compatibility into a repeatable eval instead of an anecdotal demo.


DeepSWE benchmarks GPT-5.5 at 70% on 113 tasks across 91 repos
DeepSWE launched a coding benchmark built from 113 original tasks across 91 repos and five languages, with GPT-5.5 leading at 70%. The setup is meant to better reflect repo search, multi-file edits, and verification in real agent workflows.

Trajectory launches continual-learning platform with off-policy SDPO
Trajectory launched a platform that turns agent traces and user corrections into post-deployment model updates instead of prompt-only fixes. Baseten and Tinker described live A/B post-training, 397B-model deployment work, and an off-policy recipe for stabilizing the loop.

Hermes Agent integrates MCP Catalog, Qwen3.7 Max, Venice, and Krea 2 in one window
Hermes Agent added a built-in MCP Catalog while separate builders shipped Qwen3.7 Max support, Venice private-model workflows, and Krea 2 image generation. The cluster shows Hermes moving beyond a single-model assistant toward a broader agent shell with tool, model, and media providers.

Tax AI reports 97% accuracy across 7,000 returns at 30+ accounting firms
OpenAI and Thrive described Tax AI, a self-improving tax-prep system used across 30+ firms that processed 7,000 returns and reached up to 97% accuracy. The loop turns accountant corrections into eval targets and narrow Codex fixes, showing a concrete path to vertical agents that improve after deployment.
Ramp reports business AI token spend at 13x January 2025 levels
OpenAI adds private MCP server access over outbound-only HTTPS
Perplexity releases Unigram tokenizer with 5-6x lower CPU use
Hyperbrowser launches AgentRank to test Claude, GPT, and Gemini on real websites

OpenAI outages API and ChatGPT with elevated latencies across GPT-5.5 workflows

Cua Driver supports Windows background computer use over MCP and CLI

Claude Code 2.1.153 fixes stateful MCP regressions and adds skipLfs

Codex removes GPT-5.2 and GPT-5.3-Codex on June 2
Top storiesthis week
Qwen3.7 Max ships implicit caching for no-setup context reuse
Alibaba rolled out implicit caching for Qwen3.7 Max, automatically reusing repeated context without user setup. The update also lands with fresh benchmark results and broader coding-agent support across OpenCode and Hermes Agent.


Report: Claude Mythos reportedly solves Erdős problem #90 in air-gapped test
Anthropic staff and outside observers said a Mythos-powered Claude Code setup solved Erdős problem #90 in an internet-blocked test. The result is still based on harnessed runs and social-thread disclosures, so watch for fuller verification before treating it as settled.

Claude Code ships security-guidance plugin with repo-level claude-security-guidance.md rules
Anthropic added a security plugin to the Claude Code marketplace and said internal use cut security-related PR comments by 30-40%. Teams can use it to enforce repo or MDM-distributed policies before human review.

OpenRouter raises $113M Series B as weekly volume hits 25T tokens
OpenRouter announced a $113M Series B led by CapitalG and said weekly routed volume grew from 5T to 25T tokens in six months. The funding matters because the company is pitching itself as production infrastructure for multi-model deployments, not just an API convenience layer.

SynthID adds OpenAI, ElevenLabs, and Kakao partners as Search and Chrome gain verification
Google expanded SynthID with new model partners and pushed verification into Search, Chrome, and Pixel video provenance flows. That matters because AI-content authentication is moving from isolated model outputs into mainstream browser and distribution surfaces.






