Anthropic Claude Code + Opus 4.5 ships 259 PRs – 325M tokens

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Claude Code with Opus 4.5 is crossing from demo to day‑job infrastructure: engineer Boris Cherny reports 259 merged PRs, 497 commits, ~40k LOC added and 38k removed over 30 days—all authored by Claude Code—across 1.6k sessions and 325.2M tokens, with a longest continuous run of 1 day 18 hours. Stop hooks plus the Ralph‑Wiggum plugin let external logic “poke” Claude and chain invocations, turning flaky long calls into multi‑day refactors. Andrej Karpathy shows the same stack discovering and controlling Lutron home‑automation hardware, while others wire Claude to drive a physical oven via Python, reinforcing sentiment that Opus 4.5 inside Claude Code is becoming the default agentic harness rather than a side tool.

• Self‑improvement and safety: OpenAI elevates AI self‑improvement into a top‑tier Preparedness “Tracked Category” and recruits a Head of Preparedness to oversee risks from running, self‑improving systems; parallel work on psychological jailbreaks reports 88.1% success using multi‑turn social manipulation, while Meta’s RL‑trained LLM moderators claim up to 100× data efficiency over SFT.
• Inference efficiency: NVIDIA’s Nemotron 3 Nano 30B‑A3B hybrid MoE–Mamba model targets 3.3× faster decode with 1M‑token context, as ES‑CoT and PHOTON propose early‑stopping and hierarchical decoders to cut reasoning tokens and memory growth.

Feature: Claude Code + Opus 4.5 crosses the practicality threshold

Boris Cherny reports 259 PRs/30 days written by Claude Code+Opus 4.5, with multi‑hour/day sessions via Stop hooks; community adds real device control (home automation, ovens) and shifts from typing to supervising.

Cross‑account posts show Claude Code + Opus 4.5 running for hours/days with Stop hooks, shipping large PR volumes and even operating real devices. Today’s sample is heavy on first‑hand production use, tips, and downstream workflows.

Jump to Feature: Claude Code + Opus 4.5 crosses the practicality threshold topics

🛠️ Feature: Claude Code + Opus 4.5 crosses the practicality threshold

Claude Code + Opus 4.5 ships 259 PRs and 325M tokens in 30 days

Claude Code throughput (Anthropic): Boris Cherny reports that in the last 30 days he merged 259 PRs, 497 commits, ~40k LOC added and 38k removed, all authored by Claude Code + Opus 4.5, with 1.6k sessions, a longest run of 1 day 18 hours 50 minutes, and 325.2M tokens used, as shown in the usage recap usage stats; the same engineer previously described a month where he never opened an IDE while Opus 4.5 wrote ~200 PRs, which this extends into a quantified picture of sustained, high‑volume agentic coding 200 PR month.

Claude as bottleneck shifter: Cherny frames the shift as “increasingly, code is no longer the bottleneck,” with the scarce resource becoming deciding what to build, how to test it, and what to accept as correct, a framing echoed and amplified by Rohan Paul’s summary that the “world has shifted” toward those higher‑level choices scarcity framing. The same tweet thread underscores that Claude now runs reliably for minutes, hours, and days at a time using Stop hooks, turning what a year ago was a flaky assistant into something that can drive long refactors and memory‑leak hunts end‑to‑end usage stats.

Anthropic Claude Code + Opus 4.5 ships 259 PRs – 325M tokens

Executive Summary

Top links today

Feature: Claude Code + Opus 4.5 crosses the practicality threshold

Table of Contents

🛠️ Feature: Claude Code + Opus 4.5 crosses the practicality threshold

Claude Code + Opus 4.5 ships 259 PRs and 325M tokens in 30 days

Opus 4.5 in Claude Code becomes de‑facto default for many builders

Claude Code starts operating real home devices from lights to ovens

Stop hooks and Ralph‑Wiggum explain how Claude Code runs for days

🛡️ Preparedness, self‑improvement risk and social jailbreak tactics

OpenAI hires Head of Preparedness to manage self-improving systems risk

OpenAI marks AI self-improvement as a Tracked Category in preparedness framework

Psychological jailbreak attacks achieve 88% success against LLM safety policies

Meta reports RL-trained LLM moderators are up to 100× more data-efficient than SFT

Clinical LLM prognosis shifts by up to 16% when told which tests were never ordered

Study shows hidden prompts can jailbreak AI code graders and inflate scores

📊 Leaderboards: GLM‑4.7 surge and retrieval bias diagnostics

GLM-4.7 tops open-source leaderboards on AA Index and Vending-Bench2

Context Arena adds Xiaomi mimo-v2-flash with detailed long-context bias profile

Builders eye GLM-4.7 as cheap, strong open model for coding and writing

🧩 Agent platforms: stateful memory, permissions, and A2A workflows

Clawdbot update adds Discord, browser autonomy, group lurk and self‑rewrite

OpenCode supermemory plugin makes agents stateful with one‑command setup

Agent IDE tightens permissions UX with clearer "always allow" and subagents

Clawd personal assistant runs as long‑lived, tool‑rich desktop agent

Kilo Code Reviews pitches itself as a flexible CodeRabbit alternative

MCP Agent Mailbox organizes agent‑to‑agent messages with threaded viewer

Anthropic publishes Claude Skill authoring best‑practices for discoverable agents

🧠 Reasoning efficiency: early‑stopping, hierarchical decode, hybrid MoE

Nemotron 3 Nano hybrid MoE–Mamba model speeds 30B-class decode by up to 3.3×

PHOTON hierarchical decoder targets up to 10³× better throughput per memory

ES-CoT early-stops chain-of-thought and cuts tokens by ~41%

💼 Enterprise adoption and usage share moves

McKinsey: 88% of firms now use AI somewhere, but only ~1% say it’s mature

ChatGPT web share falls ~19 pts while Gemini gains ~13 pts in a year

AI boom adds ~$550bn to top Silicon Valley fortunes and 236× to Nvidia stock

Delivery Hero says Lovable prototypes cut product alignment cycles to one third

⚙️ Parallel terminals and secure agent sandboxes

Codex Background Terminal lets agents multiplex long-running shell work

exe.dev offers persistent SSH VMs as secure sandboxes for code agents

11‑hour Codex fuzz run hardens browser agents and spawns lean cookie libs

Local /sandbox pattern emerges for running Claude Code under stricter OS guardrails

🎬 Joint audio‑video gen and the rise of AI ‘slop’ at scale

ByteDance’s Seedance 1.5 Pro pushes 10× faster joint audio‑video generation

Kapwing estimates 63B‑view AI ‘slop’ economy on YouTube recommendations

NVIDIA’s 4D‑RGPT boosts region‑level 4D video QA by ~5 points

🏗️ Inference strategy: Nvidia–Groq licensing analysis continues

Nvidia–Groq $20B license framed as “acquihire without acquisition”

🤖 Robots in the wild: patrols, high‑speed dogs and show control

Autonomous tracked combat robots continue field patrols in Ukraine war footage

MiniMax M2.1 agent drives Vbot robot dog end-to-end with no remote control

MirrorMe’s Black Panther 2 quadruped hits 13.4 m/s top speed

Disney Imagineering demos underwater drones and autonomous hydrofoil show platform

Porcospino Flex single-track robot trades joints for a 3D-printed compliant spine

Shenzhen police roll out sidewalk patrol robots alongside human officers

Solar-powered robots in China clear snow from PV farms while generating power

Precision surgical robot cleanly removes seed from a grape in demo

🧭 Retrieval stacks: GraphRAG survey and hiring for agent‑first memory

GraphRAG survey formalizes graph-based retrieval pipelines for LLMs

Mixedbread hiring to build agent-first retrieval and memory stack

On this page