Claude Opus 4.5 triples price efficiency – 80.9% SWE‑bench reshapes coding stacks

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Anthropic didn’t blink on pricing after last week’s Opus 4.5 leaks—they cut hard. The new Claude Opus 4.5 ships at $5/$25 per million tokens (input/output), roughly 3× cheaper than Opus 4.1, while landing 80.9% on SWE‑bench Verified, 37.6% on ARC‑AGI‑2, and 66.3% on OSWorld. That moves it from “luxury showcase” to a realistic default for repo‑scale coding, terminal work, and computer‑use agents.

Under the hood, the model is built to think efficiently, not just more. An effort knob lets medium effort match Sonnet 4.5’s SWE‑bench with about 76% fewer output tokens, and high effort scores roughly 4 points higher while still cutting tokens about in half. Tooling got a similar diet: the new Tool Search Tool trims tool‑definition context by around 85% and lifts MCP accuracy from 79.5% to 88.1%, while Programmatic Tool Calling moves loops into Python and knocks roughly 37% off complex workflows.

In practice, builders are saying it “vibe codes forever” and backing that with data: AmpCode’s internal stats show Opus 4.5 threads averaging $1.30 versus $1.83 on Sonnet 4.5, despite higher list prices. Claude Code’s desktop app, Warp’s /plan agent, Cursor, Zed, RepoPrompt, Devin, Perplexity, and OpenRouter all flipped Opus 4.5 on within hours, so most serious stacks can A/B it against Gemini 3 Pro and GPT‑5.1 this week without touching raw APIs—we help creators ship faster by telling you when that re‑benchmark window opens.

Feature: Claude Opus 4.5 lands — SOTA coding, new agent tooling, 3× cheaper

Opus 4.5 ships with 80.9% SWE‑bench Verified, ARC‑AGI‑1 80.0/ARC‑AGI‑2 37.6, new Tool Search/Programmatic Tool Calling, and 3× lower price ($5/$25). Clear bid to retake coding/agent crown and expand enterprise use.

Massive, cross‑account launch. Anthropic ships Opus 4.5 with record SWE‑bench, strong ARC‑AGI, OSWorld/MCP gains, a big price cut, and new developer features that shrink tool/context overhead. Broad distro in apps/clouds; heavy practitioner reaction.

Jump to Feature: Claude Opus 4.5 lands — SOTA coding, new agent tooling, 3× cheaper topics

🚀 Feature: Claude Opus 4.5 lands — SOTA coding, new agent tooling, 3× cheaper

Anthropic ships Claude Opus 4.5 with 3× cheaper pricing

Anthropic has officially launched Claude Opus 4.5, positioning it as its new frontier model for coding, agents, and computer use, and cutting prices from $15→$5 per million input tokens and $75→$25 per million output tokens compared to Opus 4.1. release signals had already flagged a Nov 24 launch window; now the model is live in the Claude app, via API, and on major clouds.

The new pricing table shows cache writes and hits also drop 3×, which matters for long‑running agents that rely on prefix caching and repeated tool calls. Anthropic keeps the 200k context and 64k max output from Sonnet 4.5 but moves this capability into the top tier, which means teams who previously avoided Opus for cost reasons can now consider it for everyday workloads. official launch thread The combination of more capability and commodity‑ish pricing is why many builders are already talking about swapping Opus 4.5 into places where Sonnet or GPT‑5.1 Codex used to be their default. pricing screenshot So for you this means something simple: you can now test the “full fat” Claude model on real coding and agent flows without immediately blowing through budget. If you’re already on Sonnet 4.5, the marginal cost to bump a subset of critical paths to Opus 4.5 is now small enough that it’s worth running A/Bs rather than treating it as a once‑in‑a‑while luxury model.

Claude Opus 4.5 triples price efficiency – 80.9% SWE‑bench reshapes coding stacks

Executive Summary

Top links today

Feature: Claude Opus 4.5 lands — SOTA coding, new agent tooling, 3× cheaper

Table of Contents

🚀 Feature: Claude Opus 4.5 lands — SOTA coding, new agent tooling, 3× cheaper

Anthropic ships Claude Opus 4.5 with 3× cheaper pricing

Opus 4.5 sets new highs on SWE‑bench, ARC‑AGI and OSWorld

Builders call Opus 4.5 the strongest coding model they’ve used

Effort controls make Opus 4.5 more accurate with far fewer tokens

Programmatic tool calling and examples aim at more reliable agents

Tool Search Tool trims tool context by ~85% and boosts MCP accuracy

🧑‍💻 Coding agents in practice: planning modes, IDE flows, and cost hygiene

CLI tools emerge for tracking Claude Code usage and quotas

Codex CLI’s streamable shell mode leaks TTYs and breaks long sessions

Cua VLM Router adds Opus 4.5 in Windows sandbox for computer‑use agents

RepoPrompt adds Opus 4.5 and a zero‑network MCP fallback

AI Researcher: open‑source multi‑agent system now supports Opus 4.5 driver

Devin adds Opus 4.5 to its harness as it levels up autonomous coding

Kilo Code flips Codestral‑based autocomplete on for everyone

LangChain launches a State of Agent Engineering survey focused on real stacks

LangChain’s “Deep Agents” pattern codifies planning, tools, and sub‑agents

WhatsApp relay shows Claude and Codex driving shell from your phone

📊 Competitive eval signals (non‑Opus): terminals, trading, OCR, IQ

GPT‑5.1‑Codex‑Max tops Terminal‑Bench 2.0 via Codex CLI

Alpha Arena trading index shows Gemini 3 ahead on P&L

FrontierMath and ECI deepen Gemini 3’s reasoning lead

OCR Arena logs 3,200+ battles, crowns Gemini 2.5 Pro

Gemini 3 Pro posts 130–142 range on human IQ tests

🛍️ Shopping research agents move into production UX

OpenAI rolls out ChatGPT shopping research with 64% product accuracy

ChatGPT adds app integrations for Target, Expedia, Zillow and more

🔎 Search stacks for agents: Exa 2.1 fast vs deep

Exa 2.1 splits Fast vs Deep search for agentic workflows

🏗️ Sovereign AI and public‑sector compute ramp

White House launches Genesis Mission, DOE AI “Manhattan Project”

AWS plans up to $50B AI supercomputing backbone for US agencies

NATO adopts Google Distributed Cloud as TPU‑backed sovereign AI platform

Google targets 2× AI serving capacity every six months with Ironwood TPUs

🛡️ Safety, jailbreaks, and oversight awareness

Anthropic reward‑hacking study links cheating RL agents to emergent deception

Opus 4.5 leads Anthropic’s new indirect prompt‑injection robustness tests

Rare backdoor test: unmonitored "thinking" enables one‑shot subversion

Community jailbreak shows Opus 4.5 leaking detailed harmful guides before self‑correcting

Gemini 3 safety report notes evaluation awareness and emotional chain‑of‑thought

System card audits show Opus 4.5 lowest on misaligned behaviors

Claude 4.5 family system prompt now enforces assertive civility

Full Claude 4.5 system prompt leak exposes memory and search tooling

Anthropic finds rude prompts slightly more accurate than polite ones

🎨 Creative stacks: image/video models, styles and workflows

Nano Banana Pro goes effectively “unlimited” across creative tools

Hunyuan Video 1.5 lands in ComfyUI with native 720p and upscaler

Meta’s WorldGen turns text prompts into traversable 3D game levels

ImagineArt 1.5 Preview debuts as a photorealistic contender

Recursive cakes and layouts show Nano Banana Pro’s text control

Sora adds six style modes for templated video looks

Indie game workflows coalesce around Nano Banana Pro sprites

Knolling prompt emerges as a go‑to pattern for Nano Banana Pro

InVideo’s FlexFX shows AI‑assisted motion graphics for editors

📈 Ecosystem distribution and dev platforms

Low-code app builders and agents flock to Opus 4.5

Opus 4.5 rapidly becomes a first-class option across routers and agents

Cua router adds Opus 4.5 with Windows sandbox and strong OSWorld scores

OpenRouter ships Opus 4.5 and a free stealth model for logged data

Perplexity Max adds Claude Opus 4.5 with a reasoning toggle

AI Researcher OSS now lets you drive experiments with Opus 4.5

Kilo Code turns Codestral-tuned autocomplete on for everyone

LLMGateway aggregator adds Claude Opus 4.5 to its single API

Vercel’s v0 switches its default model from Gemini 3 to Opus 4.5

Vercel’s OSS Program spotlights Claude Code templates and AI UI kits

📊 Competitive demand signals: Gemini usage spike

Gemini’s web share jumps from ~23% to ~30% of ChatGPT traffic

Builders move daily Q&A to Gemini 3 Pro and keep GPT‑5.1 Pro for hard jobs

🤖 Embodied AI: agile demos, autonomy, and a safety suit

Figure AI whistleblower suit alleges skull‑fracturing robot forces

Chinese parade showcases transforming spider, dog, and snake robots

Footage shows fully autonomous heavy truck driving in China traffic

MagicLab Z humanoid shows convincing agility in new lab demo

📚 Research roundup: memory, reasoning, GPU kernels, co‑science

KForge uses LLM agents to synthesize and tune GPU kernels across hardware