Anthropic commits to 1M TPUv7 chips – Google cuts GB300 training costs by 50%

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

Anthropic is quietly rerouting its frontier roadmap through Google’s silicon. According to SemiAnalysis, the lab has lined up capacity for about 1M TPUv7 chips and more than 1 GW of power, split between 400k TPUs it installs itself and ~600k rented in Google Cloud. Their modeling puts TPUv7 “Ironwood” at roughly 20–50% cheaper per useful FP8 training FLOP than GB300 NVL72, with tuned kernels driving that gap toward 2× at 30–40% model FLOP utilization.

Google, for its part, is finally treating TPUs as a product, not a science project. Ironwood pods wire up to 9,216 TPUs into a single 3D‑torus fabric, versus 72 GPUs per GB300 node, and early pricing pegs older v6e at $2.70 per chip‑hour while comparable Nvidia B200s hover around $5.50 per GPU‑hour. A new native PyTorch backend means most large PyTorch shops can port without rewriting their stack in JAX, making TPUs a credible second source rather than an exotic side bet (yes, the Nvidia tax finally has a real competitor).

Higher up the stack, our ongoing Claude Opus 4.5 story keeps moving: WeirdML now shows 63.7% average accuracy while cutting thinking runs from $27 to $9, and Box’s internal evals report a 20‑point gain over Opus 4.1 on real enterprise tasks.

Feature: TPUv7 economics challenge Nvidia at scale

Google TPUv7 undercuts Nvidia on useful FLOPs; Anthropic commits ~1M TPUs and >1 GW capacity. ICI 3D‑torus + PyTorch TPU backend point to 20–50% lower TCO for large training runs.

Cross‑account focus today is Google’s TPUv7 cost/perf and leasing push, with Anthropic’s ~1M TPU commitment and detailed TCO charts. This materially changes training economics vs GB200/GB300 for frontier labs.

Jump to Feature: TPUv7 economics challenge Nvidia at scale topics

🧮 Feature: TPUv7 economics challenge Nvidia at scale

Anthropic locks in ~1M TPUv7 chips and >1 GW to hedge Nvidia

Anthropic has effectively bet its next gen Claude training runs on TPU v7, committing to about 1M chips worth of capacity and more than 1 GW of power across its own data centers plus Google Cloud, according to the same SemiAnalysis report. anthropic-tpuv7-summary Roughly 400k TPUs are expected as full racks Anthropic buys and installs itself, while another ~600k come via rented pods in GCP, letting the lab spread site risk and use the sheer TPU volume to negotiate better Nvidia pricing on the rest of its fleet. anthropic-1m-tpus-detail

Because Anthropic tunes kernels and MFU aggressively, Semianalysis estimates it can get about 50% cheaper useful training FLOPs on TPU v7 Ironwood pods than on a comparable GB300 NVL72 system, even though peak TFLOP numbers are similar. anthropic-tpuv7-summary Strategically, this move turns TPU v7 into a real second source for frontier training rather than an internal Google curiosity, and it signals to other labs that serious price leverage on Nvidia now likely requires a credible non‑GPU path—not just more bids for the same GB200/GB300 boxes. anthropic-1m-tpus-detail

Anthropic commits to 1M TPUv7 chips – Google cuts GB300 training costs by 50%

Executive Summary

Top links today

Feature: TPUv7 economics challenge Nvidia at scale

Table of Contents

🧮 Feature: TPUv7 economics challenge Nvidia at scale

Anthropic locks in ~1M TPUv7 chips and >1 GW to hedge Nvidia

SemiAnalysis puts TPUv7 20–50% cheaper per useful FLOP than GB300

Google pushes external TPUv7 leasing with PyTorch support and 9k‑chip pods

📊 Benchmarks: WeirdML reshuffle, AMO‑Bench, enterprise evals

WeirdML shows Claude Opus 4.5 surging in accuracy while cutting cost

AMO‑Bench debuts as a hard new math benchmark with Gemini 3 Pro on top

Box AI evals find Opus 4.5 +20 points over Opus 4.1 on enterprise tasks

Amp’s coding evals put Opus 4.5 ahead of Gemini 3 Pro with lower failure cost

New method corrects biased LLM‑as‑judge scores with plug‑in calibration

8‑puzzle study finds LLMs still weak at basic stateful planning

Fact‑checking eval shows many LLMs can’t cleanly copy evidence spans

🧪 New models: tool orchestrators and routing datasets

Nvidia’s ToolOrchestrator‑8B router beats GPT‑5 on HLE with 2.5× less compute

ToolScale dataset opens cost‑aware multi‑tool routing traces to everyone

Perplexity Pro and Max subscribers get Grok 4.1 as a new model option

🛠️ Agent harnesses and coding flows in practice

Agent Hooks and SI rules proposed for tool-using agents

Clawd shows what a chat-first, script-controlling agent harness looks like in practice

RepoPrompt turns its Context Builder into an MCP sub-agent

AGENTS.md emerges as a shared contract between humans and coding agents

OpenCode adds an Explore sub-agent for repo greps and globs

🧷 RAG practice: FreshStack signals and context engineering

Context engineering playbooks crystallize around Research→Plan→Implement loops

FreshStack RAG benchmark moves from award to open NeurIPS resource

Agent Hooks and tool policies emerge as fixes for brittle system prompts

🧬 New training recipes: low‑rank ES and latent video rewards

EGGROLL shows integer‑only, billion‑param evolution strategies can match GRPO‑level reasoning

PRFL trains video preference rewards directly in latent space, cutting cost vs pixel ReFL

Step‑Audio‑R1 claims test‑time compute scaling for audio LLMs via staged reasoning

🎨 Creative stacks: Z‑Image Turbo, NB Pro workflows, agentic slides

Z-Image Turbo lands in ComfyUI, Replicate and SGLang Diffusion

Early users say Kimi Agentic Slides beats prompt-only tools on real decks

Nano Banana Pro behaves like a ControlNet when given canny/depth guidance

Flowith makes Nano Banana Pro free and cuts platform prices up to 80%

Higgsfield offers 70% off unlimited Nano Banana Pro and showcases 1‑click apps

Nano Banana Pro cinematic grid prompts turn one image into a storyboard

Nano Banana Pro shines at generating coherent fictional dossiers and diagrams

Freepik’s Nano Banana Pro integration enables long-form music and video art workflows

LangChain Deep Agents highlighted as a reusable harness for creative agent stacks

Lovable adds Gemini 3 Pro and Nano Banana Pro to its AI app builder

🏗️ Capital stack and power constraints for AI build‑outs

OpenAI’s partners shoulder ~$100B in debt to fund its data centers

Epoch says OpenAI’s UAE Stargate likely slips to 1 GW only by Q3 2027

HSBC’s model shows OpenAI unprofitable through 2030 on cloud compute costs

Nadella and Altman both point to power, not GPUs, as the next AI bottleneck

🤖 Embodied AI: service deployments and low‑cost dexterity

Zerith H1 wheeled humanoids clean toilets and assist shoppers at 20+ sites

LimX Dynamics’ OLi shows rough‑terrain walking and whole‑body loco‑manipulation

TetherIA’s $314 Aero Hand delivers low‑cost dexterity with 7 motors and 16 joints

💼 Enterprise adoption notes: Julius AI, Lovable, seasonal Gemini

AthenaHQ uses Julius AI to shrink day-long SQL analysis to about an hour

Lovable adopts Gemini 3 Pro, Nano Banana Pro and hardens Shopify flows

Gemini app leans into Black Friday shopping assistant role

🧭 Strategy & timelines: scaling vs research, AGI expectations

Noam Brown: scaling current models pays off, ASI still needs breakthroughs

Practitioners say current LLMs will reshape work, but may not be the final AGI path

Researchers and strategists insist AI impact is real even without AGI

Argument: we scale inference reasoning, but not the kind of reasoning humans use to learn

Daniel Mac and peers: LLMs lead somewhere real, but not straight to ASI

Elon’s 2025 AGI prediction misses, and the goalposts keep moving

Builders argue software architecture must go AI‑first, not bolt-on

DeepMind’s John Jumper: ignore AGI labels and focus on useful systems

Gallabytes: AGI “timelines” are the wrong mental model

Sora 2 seen as a studio tool, not a consumer AGI toy

On this page