Fresh stories
Developers publish loop libraries and control-loop guides for long-running agents
Builders released reusable loop artifacts this week, including a Loop Library Skill, repo templates, and published control-loop definitions for docs sweeps, onboarding checks, and error triage. It matters because teams are turning one-shot prompting into persistent agent runs with explicit stop conditions and shared repo state.

Codex adds local-to-remote thread handoff with Git worktree transfer
Codex can now hand off an in-progress thread between local and remote machines and bring it back later. It matters because the handoff carries Git history, branches, and uncommitted changes while leaving the destination checkout untouched.

Secure Exec v0.3 rewrites in Rust and adds Bun SDK, process trees, and Node-less mode
Secure Exec v0.3 shipped a full Rust rewrite, Bun and Rust SDKs, process-tree support for spawn and exec inside the VM, and a configurable Node-less mode. It matters because agent sandboxes can tighten performance and isolation without depending on a full Node runtime.


Engineers report GLM-5.2 matches near-Opus planning at about 1/10 the price
Independent tests put GLM-5.2 near Opus 4.8 and GPT-5.5 on planning and coding, and users shared Claude Code, BrowserCode, dcode, and local-serving recipes. It matters because many engineers are treating it as a daily-driver option for text-heavy coding, though teams still report weaker vision and provider limits.

Developers publish loop libraries and control-loop guides for long-running agents
Builders released reusable loop artifacts this week, including a Loop Library Skill, repo templates, and published control-loop definitions for docs sweeps, onboarding checks, and error triage. It matters because teams are turning one-shot prompting into persistent agent runs with explicit stop conditions and shared repo state.

lift-pdf releases 9B extractor with 90.2% accuracy and 9.5s p50
lift-pdf released an open-source 9B model for schema-constrained document extraction, with code, pip install, playground access, and a 90.2% score on the team's 225-document bench. It matters because the model claims near-Gemini 3.5 Flash accuracy at 9.5s p50, though coverage is still skewed toward Latin-language docs and commercial-use limits remain.

ComputeSDK releases 2026 100k Scale Invitational results across 6 sandbox providers
ComputeSDK published results from its 2026 100k Scale Invitational after weeks of reruns and infra tuning across Modal, Tensorlake, Northflank, Declaw AI, E2B, and Isorun. It matters because sandbox and agent infra claims now have a shared public concurrency target instead of vendor-specific load demos.
Codex adds local-to-remote thread handoff with Git worktree transfer
Kilo Code adds Terminal Bench scores and average attempt cost to model picker
Plannotator 0.21.0 adds direct file editing, embedded CLI agents, and Bedrock support
Secure Exec v0.3 rewrites in Rust and adds Bun SDK, process trees, and Node-less mode
Top storiesthis week
Claude Code adds Artifacts for PR walkthroughs and live debug pages
Claude Code can now turn a live session into a private artifact page for PR walkthroughs, debug timelines, dashboards, and architecture notes. Team and Enterprise users get a persistent review surface that refreshes as the session changes.


GLM-5.2 ships in Claude Code, Droid, and 2-bit GGUF workflows
Builders published Claude Code and Droid setups for GLM-5.2 while Unsloth quantized it for local 256GB machines and Hugging Face opened temporary free inference. Teams can now run the open-weight model across hosted, local, and agent workflows.

Poolside releases Laguna M.1 open weights with 225B MoE and 256K context
Poolside released Apache 2.0 weights for Laguna M.1 and XS.2, its long-horizon coding models, with M.1 shipping at 225B total parameters, 23B active, and 256K context. SGLang and vLLM support on day one lets teams run and fine-tune the models in existing agent stacks immediately.

MCP adds Enterprise-Managed Auth with Okta beta and VS Code support
Anthropic introduced an MCP extension that lets admins authorize connectors through their identity provider instead of repeated per-user OAuth flows. VS Code added support the same day, which matters because teams can keep connector policy and audit controls in existing enterprise identity systems while reducing setup friction.

OpenAI reports beneficial RL improves 44 of 53 evals and transfers beyond health
OpenAI said reinforcement learning on realistic conversations improved 44 of 53 alignment and benefit evaluations, including transfer from health-only training to deception and reward-hacking tests. The result suggests a broader behavioral shift rather than narrow task tuning, but the claim is based on OpenAI’s own eval mix rather than a single public benchmark.






