Fresh stories

GLM-5.2 ranks #1 on Vals and Design Arena, AA Coding Index hits 50.7
Fresh third-party results put GLM-5.2 atop multiple open-model leaderboards, including the AA Coding Index, Vals Index, Terminal Bench 2.1, and Design Arena. The scores add independent confirmation, though demand spiked enough to strain some providers.
Codex supports open-weight models via Ollama, vLLM, and Responses-compatible endpoints
Codex workflows can now run against open-weight models served through compatible Responses API endpoints, with Ollama and vLLM publishing direct paths for GLM-5.2 and Kimi K2.7 Code. That matters because teams can keep the Codex interface while swapping to self-hosted or lower-cost inference backends.

HumanLayer opens an agentic IDE with remote daemons and software-factory workflows
HumanLayer opened access to an agentic IDE, collaboration surface, and software-factory building blocks aimed at long-running codebase work. The launch matters because it pairs remote daemon execution and review loops with architecture guardrails instead of optimizing only for raw code generation.


Vercel previews eve with durable execution and sandboxed compute
Vercel introduced eve in public preview with durable workflows, sandboxed compute, subagents, and evals. It also added Connect and Passport for scoped tokens and identity-gated deployments, giving teams one path for runtime, auth, and enterprise access control.

GLM-5.2 ranks #1 on Vals and Design Arena, AA Coding Index hits 50.7
Fresh third-party results put GLM-5.2 atop multiple open-model leaderboards, including the AA Coding Index, Vals Index, Terminal Bench 2.1, and Design Arena. The scores add independent confirmation, though demand spiked enough to strain some providers.

Firecrawl launches Research Index with /search/research API and 18% arXivQA recall gain
Firecrawl released a research-specific search index with 3M+ arXiv papers, GitHub artifacts, and a /search/research interface across API, CLI, MCP, and SDKs. It combines literature retrieval, claim verification, and code lookup in one surface for research agents.

Omnigent opens live Claude Code and Codex sessions with phone control
Databricks open-sourced Omnigent, a meta-harness that runs Claude Code, Codex, Cursor, Pi, and custom agents in one live session with a collaborative web UI. The release centralizes supervision, cost control, and cross-agent review instead of splitting work across separate tools.
Codex supports open-weight models via Ollama, vLLM, and Responses-compatible endpoints
Claude Code 2.1.181 adds /config key=value and presence-file push suppression
Factory adds AutoWiki: /wiki generates repo docs on every push
HumanLayer opens an agentic IDE with remote daemons and software-factory workflows
Top storiesthis week
Claude Design adds repo-based design-system imports and Claude Code sync in beta
Anthropic updated Claude Design with design-system imports from repos, codebases, or design files, plus bidirectional sync with Claude Code and a canvas editor. The update moves it from generic mockups toward designs checked against real components.


Commerce Department limits Claude Fable 5 exports worldwide, including foreign nationals in the U.S.
BIS and new reporting show Fable 5 restrictions now apply worldwide and can cover foreign nationals in the U.S. Teams should treat the pause as a broader access risk for allied markets and global deployments.

Z.ai releases GLM-5.2 open weights with 1M context and 46.2% DeepSWE
Z.ai released GLM-5.2 MIT-licensed open weights with 1M context and broad runtime support. Vendor and arena results put it near frontier closed models on long-horizon coding.

Anthropic reports Claude Code task success stays within 7 points of software engineering across occupations
Anthropic published data from 400,000 Claude Code sessions, finding average task value rose 27% and verifiable success across occupations stayed within seven points of software engineering. The report gives teams a concrete baseline for where coding agents already generalize and where domain expertise still changes outcomes.

Cursor reports a $60B all-stock deal with SpaceX
Cursor said it agreed to a $60B all-stock deal with SpaceX, with closing targeted for Q3 and Cursor remaining a wholly owned subsidiary. The deal ties a major coding-agent channel to SpaceX compute and gives Cursor a new strategic owner.



