Anthropic locks ~1M Google TPUs โ€“ capacity tops 1 GW

Stay in the loop

Free daily newsletter & Telegram daily report

Join Telegram Channel

Executive Summary

Anthropic just turned rumor into steel: itโ€™s locking up roughly 1M Google TPUs with well over 1 GW slated to come online in 2026. The spend is โ€œtens of billionsโ€ and it isnโ€™t just about training; Anthropic says it chose TPUs for priceโ€‘performance on serving too, which is where margins go to die if you pick the wrong silicon. This is the rare compute deal that actually changes a roadmap: guaranteed throughput means shorter training queues and fewer rateโ€‘limit headaches for customers next year.

Google is pushing Ironwood (TPU v7) as the servingโ€‘first piece of the puzzle, and that tracks with Anthropicโ€™s pitch to enterprise buyers who care more about steadyโ€‘state token costs than oneโ€‘off megatrain runs. Demand doesnโ€™t look madeโ€‘up eitherโ€”company commentary pegs annualized revenue near $7B, which explains why theyโ€™re preโ€‘buying capacity instead of praying for cancellations on GPU waitlists. Still, Anthropic is careful to say itโ€™s staying multiโ€‘cloud and multiโ€‘silicon, with Amazon Trainium and NVIDIA GPUs in the mix so workloads can land where unit economics and latency actually make sense.

Net: this is a compute hedge and a serving bet wrapped into one, and it puts real pressure on rivals to show similar 2026โ€‘dated capacity, not just MOUs.

Feature Spotlight

Feature: Anthropic ร— Google secure ~1M TPUs, >1 GW by 2026

Anthropic locks a multiโ€‘year, multiโ€‘billion Google Cloud deal for up to 1M TPUs (>1 GW by 2026), materially expanding Claude training and serving capacity and reshaping compute economics for enterprise AI.

Cross-account confirmation that Anthropic will massively expand on Google Cloud TPUsโ€”tens of billions in spendโ€”to scale Claude training/inference. Multiple tweets cite the 1M TPU figure, >1 GW capacity online in 2026, and current enterprise traction.

Jump to Feature: Anthropic ร— Google secure ~1M TPUs, >1 GW by 2026 topics

Table of Contents

โšก Feature: Anthropic ร— Google secure ~1M TPUs, >1 GW by 2026

Cross-account confirmation that Anthropic will massively expand on Google Cloud TPUsโ€”tens of billions in spendโ€”to scale Claude training/inference. Multiple tweets cite the 1M TPU figure, >1 GW capacity online in 2026, and current enterprise traction.

Anthropic locks up ~1M Google TPUs and >1 GW for 2026 in a deal worth tens of billions

Anthropic and Google confirmed a massive TPU expansionโ€”approximately one million chips and well over 1 GW of capacity coming online in 2026โ€”to scale Claude training and serving, with spend described as โ€œtens of billions.โ€ The company frames the move as priceโ€‘performance driven on TPUs, timed to accelerating demand. Following up on compute pact, which noted talks and early signals, todayโ€™s posts quantify capacity and timing, and reiterate why TPUs fit Anthropicโ€™s cost curve Deal announcement, Anthropic blog post, Press confirmation, Google press page.

For AI leads, the headline is concrete: guaranteed throughput for 2026 (training queues and serving readiness) and a visible hedge against GPU scarcityโ€”without abandoning other stacks.

Anthropic demand picture: 300k+ business customers, large accounts up ~7ร— YoY, revenue near $7B

Anthropic says it now serves 300,000+ businesses with nearly 7ร— growth in large accounts over the past year; commentary adds annualized revenue approaching $7B and Claude Code surpassing a $500M runโ€‘rate within monthsโ€”helping justify the TPU scaleโ€‘up Anthropic blog post, Company summary, Analysis thread.

Implication for buyers: capacity wonโ€™t just shorten waitlistsโ€”it should stabilize SLAs and rate limits as onboarding accelerates.

Ironwood, Googleโ€™s 7thโ€‘gen TPU for highโ€‘throughput inference, is central to Anthropicโ€™s plan

Google highlights Ironwood (TPU v7) as a servingโ€‘first design that lowers cost per token via 256โ€‘chip pods and 9,216โ€‘chip superpods, matching Anthropicโ€™s need to scale inference economically alongside training Google press page. Anthropicโ€™s own post ties the expansion to observed TPU priceโ€‘performance over multiple generations, reinforcing why this capacity lines up for 2026 Anthropic blog post.

For platform teams, this signals practical gains: cheaper steadyโ€‘state throughput for enterprise traffic, not just bigโ€‘bang training windows.

Despite the TPU megadeal, Anthropic reiterates a multiโ€‘cloud, multiโ€‘silicon strategy

Alongside the Google TPU expansion, Anthropic stresses it will continue training and serving across Amazon Trainium and NVIDIA GPUs; Amazon remains a core training partner via Project Rainier, tempering vendor lockโ€‘in and letting workloads land where unit economics and latency fit best Anthropic blog post, Analysis thread.

For architects, this means portability pressures remain: plan for heterogeneous kernels, model builds, and orchestration that can shift between TPU, GPU, and ASIC targets as prices and queues move.


๐Ÿ–ฅ๏ธ OpenAI buys Sky: screen-aware Mac actions

OpenAI acquired Software Applications Inc. (Sky), an Appleโ€‘veteran team building a Mac, screenโ€‘aware natural language interface. Excludes the Anthropicโ€“Google compute pact (covered in Feature). Focus here is OSโ€‘level agent UX and M&A signal.

OpenAI buys Sky to add screenโ€‘aware Mac actions to ChatGPT

OpenAI acquired Software Applications Inc. (Sky), a Mac overlay agent that understands whatโ€™s on screen and can take actions through native apps; the team is joining to bring these capabilities into ChatGPT, with terms undisclosed OpenAI blog, and the acquisition confirmed across community posts acquisition post, announcement link.

OpenAI frames the deal as moving from โ€œanswersโ€ to helping users get things done on macOS, implying deeper OSโ€‘level permissions, context, and action execution beyond web automations OpenAI blog.

Signal in the noise: Sky shows OpenAIโ€™s platform push into OSโ€‘level agents

Practitioner briefs note OpenAI has been on an acquisitions streak and describe Skyโ€™s product as a floating desktop agent that understands the active window and can trigger actions in local apps like Calendar, Messages, Safari, Finder and Mailโ€”an explicit platform move beyond webโ€‘only automation feature explainer. Coupled with OpenAIโ€™s own integration plan, this suggests a nearโ€‘term consolidation of agent UX at the OS layer to win trust, control latencies, and harden permissions around sensitive actions OpenAI blog.

Workflow/Shortcuts alumni behind Sky bring deep macOS automation chops to OpenAI

Skyโ€™s founders previously built Workflow (acquired by Apple and turned into Shortcuts), and community posts say the team had a summer release queued before the acquisitionโ€”an overlay agent that could read the screen and drive Mac appsโ€”highlighting rare, lowโ€‘level macOS automation expertise now in OpenAIโ€™s stack prelaunch details, product description, community recap. This background reduces integration risk and accelerates building a reliable, permissionsโ€‘aware OS agent versus purely browserโ€‘bound automation.

OpenAI positions Sky as a shift from chat to actionโ€”and discloses Altmanโ€‘linked passive investment

In its note, OpenAI emphasizes Sky will help โ€œget things doneโ€ on macOSโ€”not just respond to promptsโ€”while stating all team members are joining OpenAI to deliver these capabilities at scale OpenAI blog. The post also discloses that a fund associated with Sam Altman held a passive Sky investment and that independent Transaction/Audit Committees approved the deal, a governance detail leaders will track as OSโ€‘level agents gain wider powers OpenAI blog.

What a screenโ€‘aware Mac agent unlocks for developers and IT

A Skyโ€‘style agent can reason over onโ€‘screen context and invoke native intentsโ€”bridging ambiguous dialog (โ€œwhatโ€™s on my screen?โ€) with deterministic app actions and user approvals. Community summaries cite concrete app domains Sky targeted (Calendar/Messages/Notes/Safari/Finder/Mail) and a desktop overlay UX, signaling new integration surfaces for secure, auditable automations and policy controls on macOS fleets feature explainer, product description.


๐ŸŽฌ Cinematic AI video goes open: LTXโ€‘2 arrives

Lightricksโ€™ LTXโ€‘2 dominates todayโ€™s genโ€‘media chatter: native 4K up to 50 fps with synchronized audio, 10โ€“15s sequences, and dayโ€‘0 availability via fal/Replicate; weights to open later this year. Excludes Genie worldโ€‘model news (separate category).

LTXโ€‘2 debuts with native 4K, up to 50 fps, and synced audio; open weights coming later this year

Lightricksโ€™ LTXโ€‘2 arrives as a cinematicโ€‘grade AI video engine: native 4K output, up to 50 fps, synchronized audio/dialog, and ~10โ€“15โ€‘second sequences designed for real creative workflows, with APIโ€‘readiness today and weights slated to open later this year capability highlights, weights plan. Early handsโ€‘on testers are positioning it as a stepโ€‘change over prior demoโ€‘grade models, citing resolution fidelity and motion smoothness aligned to professional pipelines review thread.

fal ships dayโ€‘0 LTXโ€‘2 APIs (Fast/Pro) for textโ†’video and imageโ†’video up to 4K with perโ€‘second pricing

fal made LTXโ€‘2 available on day one with Fast and Pro endpoints for both textโ†’video and imageโ†’video at 1080p, 1440p, and 4K, supporting synchronized audio and up to 50 fps; usage is metered perโ€‘second with published rate tiers on each model page availability brief, Text to video fast, Text to video pro, Image to video fast, Image to video pro.

In practice, this gives teams an immediate path to prototype and scale highโ€‘fidelity clips via API without managing custom serving, while preserving a clean upgrade track to Pro for higher quality runs.

Replicate lists LTXโ€‘2 Fast and Pro with prompt guidelines and example workflows

Replicate now hosts lightricks/ltxโ€‘2โ€‘fast and lightricks/ltxโ€‘2โ€‘pro, complete with promptโ€‘writing guidance, example pipelines, and API playgrounds to speed adoption into existing tooling hosting update, Replicate fast model, Replicate pro model. For AI engineers, this lowers integration friction (oneโ€‘click deploys, consistent SDKs) while enabling sideโ€‘byโ€‘side Fast/Pro comparisons for costโ€“quality tuning in production.

Practitioners call LTXโ€‘2 a new bar; native 4K motion and texture beat upscaled outputs

Early testers report a clear perceptual gap between LTXโ€‘2โ€™s native 4K and prior upscaled pipelines, citing sharper textures, steadier motion, and coherent audio that shortens postโ€‘production cycles review thread, native vs upscaled. For teams evaluating model swaps, expect fewer artifacts in fast action and dialogueโ€‘driven scenes, plus simpler editorial passes when cutting short spots and trailers.


๐Ÿงญ Agentic browsers: Edge Copilot Mode and fall updates

Microsoftโ€™s Edge adds Copilot Mode with Actions for onโ€‘page navigation, tab management, and history context. Copilot Sessions โ€˜Fall releaseโ€™ teases Mico/Clippy, groups, and health features. Excludes OpenAI Atlas (prior day) to keep today focused on Edge updates.

Edge adds Copilot Mode with Actions, autonomy levels, and optโ€‘in Page Context

Microsoft is turning Edge into an agentic browser: Copilot Mode can navigate pages, execute multiโ€‘step Actions (unsubscribe, book, scroll to sections), manage tabs, and draw on browsing history if users enable Page Context. Handsโ€‘on reports show three autonomy settings (light, balanced, strict) and a Preview toggle to watch or backgroundโ€‘run tasks feature brief, how to enable, deep dive thread.

  • Actions sequences and tool use are visible, with suggested flows for common chores and guardrails around history access actions samples.

Copilot Sessions Fall update brings Groups, Mico/Clippy, and crossโ€‘app memory

At Copilot Sessions, Microsoft previewed a broad Fall update: Groups for up to 32 participants, a longโ€‘term memory that spans apps, and a more expressive Mico avatarโ€”with an Easterโ€‘egg return of Clippy. Early notes also highlight health Q&A grounded in vetted sources, stronger privacy optโ€‘ins, and a staged U.S. rollout before expanding feature collage, event stream, feature recap.

Following up on Feature lineup that teased 12 areas, todayโ€™s session put numbers (32โ€‘user Groups) and concrete capabilities on the roadmap while reinforcing an โ€œAI agentic browserโ€ framing across Edge and Copilot.


โœ๏ธ Ship faster in AI Studio: Annotate & vibe coding

Google AI Studio adds Annotate Mode: draw on your running app UI and have Gemini implement changes. Builders showcase โ€˜vibe codingโ€™ flows with prebuilt components and grounded Search. Strong traction signals (traffic spike) surfaced today.

Google AI Studio adds Annotate Mode for pointโ€‘andโ€‘edit coding

Google AI Studio now lets you draw directly on your app preview and have Gemini implement the change in code, collapsing reviewโ†’specโ†’commit loops into a single pass. The update ships inside the Build experience and supports fineโ€‘grained tweaks (e.g., animations) without leaving the IDE-like canvas feature brief, announcement, AI Studio build, annotate details.

For teams, this makes UI polish and stakeholder feedback far more executableโ€”nonโ€‘developers can mark targets in context while engineers keep a clean diff trail. Early users report the feature feels natural in the new AIโ€‘assisted flow of โ€œpoint, narrate intent, shipโ€ feature mention.

Vibe coding in AI Studio: NL intents to runnable apps with Search grounding

Creators showcased โ€œvibe codingโ€ in AI Studio: pick prebuilt components (speech, image analysis), describe the app in natural language, and get runnable code plus a live preview grounded in Google Search. The demo walks through highlightโ€‘andโ€‘edit cycles, showing Gemini wiring UI changes and data calls endโ€‘toโ€‘end video demo, YouTube demo.

Beyond prototyping speed, Search grounding adds productionโ€‘like behavior (fresh results/citations) to early builds, reducing the gap between demo logic and real integrations feature brief.

AI Studio traffic jumps 64% in September, topping ~160M monthly visits

AI Studioโ€™s site traffic spiked ~64% in September to ~160M+ visits, its biggest surge since the Gemini 2 cycleโ€”evidence that annotateโ€‘andโ€‘vibe coding workflows are resonating with builders traffic chart. Following up on traffic surge that highlighted the 160M+ milestone, todayโ€™s chart underscores momentum rather than a oneโ€‘off bump, suggesting sustained interest as new Build features roll out.


๐Ÿš€ Agent infra for builders: Vercel Agent, WDK, Marketplace

Vercel Ship AI day brings a cohesive agent stack: Vercel Agent (code review + investigations), Workflow Development Kit (โ€˜use workflowโ€™ durability), a Marketplace for agents/services, and zeroโ€‘config backends for AI apps.

Vercel Agent launches in public beta with AI code review and incident investigations

Vercel introduced an AI teammate that performs PR reviews by running simulated builds in a Sandbox and triggers AI-led investigations when telemetry flags anomalies, now available in Public Beta on AI Cloud product blog, and documented in full on the launch post Vercel blog. This slots into a broader Ship AI push aimed at making agentic workflows firstโ€‘class for app teams.

Workflow Development Kit makes reliability โ€œjust codeโ€ with durable, resumable steps

Vercelโ€™s WDK adds a use workflow primitive that turns async functions into durable workflows that pause, resume, and persist automatically; each use step is isolated, retried on failure, and stateโ€‘replayed across deploys feature brief, with deeper details in the launch writeโ€‘up Vercel blog. Early builders immediately pressed for controls like cancellation, idempotency keys, handling code changes, and rollbacksโ€”useful signals for WDK ergonomics and docs to address next dev questions, followโ€‘up questions.

Vercel Marketplace debuts with agent apps and AI infrastructure services, unified billing

Vercel opened a marketplace that ships plugโ€‘in โ€œagentsโ€ (e.g., CodeRabbit, Corridor, Sourcery) and โ€œservicesโ€ (Autonoma, Braintrust, Browser Use, Chatbase, Mixedbread and more) behind one install and bill marketplace blog, with partners announcing dayโ€‘one availability coderabbit launch, mixedbread launch. The intent is to reduce the integration sprawl for teams adopting agentic patterns while keeping observability centralized.

AI SDK 6 (beta) unifies agent abstraction with humanโ€‘inโ€‘theโ€‘loop tool approvals and image editing

Vercelโ€™s AI SDK 6 beta stabilizes an agent abstraction layer, adds toolโ€‘execution approval for humanโ€‘inโ€‘theโ€‘loop control, and extends image editing supportโ€”positioning the SDK as the default interface across models and providers for agent apps sdk beta image. These capabilities complement Vercel Agent and WDK so teams can define logic once and run it reliably on AI Cloud.

Zeroโ€‘config backends on Vercel AI Cloud bring frameworkโ€‘defined infra and unified observability

Vercel AI Cloud now provisions and scales backends from your chosen framework with no extra YAML or Docker, adds perโ€‘route scaling, and centralizes logs, traces, and metrics so AI apps get a productionโ€‘grade control plane out of the box backends blog, Vercel blog. For agent builders, this pairs with the AI stack to simplify deploying toolโ€‘rich, stateful services without bespoke infra plumbing.


๐Ÿงฉ Enterprise collaboration & context: projects, knowledge, memory

Teams features dominated: OpenAI expands Shared Projects (with perโ€‘tier limits) and ships Company Knowledge with connectors/citations; Anthropic rolls out projectโ€‘scoped Memory to Max/Pro with incognito chats. Excludes OpenAIโ€™s Sky M&A (separate).

Company Knowledge arrives for Business, Enterprise, and Edu with GPTโ€‘5 search across Slack/SharePoint/Drive/GitHub and citations

ChatGPT can now pull trusted answers from your organizationโ€™s toolsโ€”Slack, SharePoint, Google Drive, GitHubโ€”with a GPTโ€‘5โ€“based model that searches across sources and cites where each answer came from, now rolling out to Business, Enterprise, and Edu feature screenshot, OpenAI blog.

New connectors were added alongside the rollout (e.g., Asana, GitLab Issues, ClickUp), and admins can review the Business release notes for setup details and visibility controls business notes, and Business release notes. See OpenAIโ€™s overview for capabilities and citation behavior OpenAI blog.

OpenAI rolls out Shared Projects to Free, Plus, and Pro with tier caps and project-only memory

OpenAI is expanding Shared Projects to all ChatGPT tiers so teams can work from shared chats, files, and instructions, with projectโ€‘scoped memory enabled automatically on shared projects feature post, rollout summary.

  • Tier limits: Free supports up to 5 files and 5 collaborators, Plus/Go up to 25 files and 10 collaborators, and Pro up to 40 files and 100 collaborators, per OpenAIโ€™s notes release summary, and OpenAI release notes.

Anthropic ships projectโ€‘scoped Memory to Max and starts Pro rollout with incognito chats and safety guardrails

Anthropic enabled Memory for Max customers and will roll it out to Pro over the next two weeks; each project keeps its own memory that users can view/edit, with an incognito chat mode that avoids saving, following internal safety testing rollout note, memory page.

Practitioners highlight projectโ€‘scoped memory as a practical way to prevent crossโ€‘pollination between unrelated workstreams user sentiment, with full details and controls in Anthropicโ€™s announcement Anthropic memory page.


๐Ÿ“„ Document AI momentum: LightOnOCRโ€‘1B and tooling

OCR/VLM remained hot: LightOnOCRโ€‘1B debuts as a fast, endโ€‘toโ€‘end domainโ€‘tunable model; vLLM adds OCR model support; applied guides explain deployment and โ€˜optical compressionโ€™ angles. Mostly practical releases and howโ€‘tos today.

LightOnOCRโ€‘1B debuts: fast, endโ€‘toโ€‘end OCR with SOTA-class quality; training data release teased

LightOn unveiled LightOnOCRโ€‘1B, an endโ€‘toโ€‘end OCR/VLM that targets stateโ€‘ofโ€‘theโ€‘art accuracy while running significantly faster than recent releases, and says a curated training dataset will be released soon. The team details design choices (e.g., teacher size, resolution, domain adaptation) and shipped readyโ€‘toโ€‘run models, including vLLM availability. See the announcement and technical blog for architecture and ablation results release thread, with more notes that the dataset is โ€œcoming soonโ€ followโ€‘up note, and the model and collection pages for immediate use Hugging Face blog, Models collection.

Baseten explains DeepSeekโ€‘OCRโ€™s โ€œoptical compressionโ€ and ships a 10โ€‘minute deploy path

Baseten breaks down why DeepSeekโ€‘OCRโ€™s imageโ€‘native pipelines are dramatically cheaper and faster (compressing text visually before decoding) and provides a ready template to stand up inference in under ten minutes. This adds actionable ops guidance following up on vLLM support and libraryโ€‘scale conversions reported yesterday, with concrete throughput/cost angles for production teams blog summary, Baseten blog, and an additional pointer from the team blog pointer.

Hugging Face updates open OCR model comparison with Chandra, OlmOCRโ€‘2, Qwen3โ€‘VL and averaged scores

Hugging Face refreshed its applied guide and comparison for open OCR/VLMs, adding Chandra, OlmOCRโ€‘2 and Qwen3โ€‘VL plus an averaged OlmOCR score, giving practitioners clearer tradeโ€‘offs on accuracy, latency and deployment patterns. The post complements recent LightOnOCR and DeepSeek work by focusing on practical pipelines and costs blog update, with the full writeโ€‘up here Hugging Face blog.

vLLM flags surge of small, fast OCR models landing for production serving

vLLM highlighted that compact OCR models are "taking off" on the platform, underscoring practical, highโ€‘throughput serving for document AI workloads. This aligns with LightOnOCRโ€‘1Bโ€™s immediate vLLM availability and broader momentum toward efficient OCR/VLM deployment vLLM comment, model availability.

Hugging Face promotes fewโ€‘click deployment for the latest OCR models

Hugging Face highlighted that current OCR models can be deployed in a few clicks on its platform, lowering the bar for teams to productionize document AI without bespoke infra. This dovetails with the updated model comparison to help practitioners choose and ship quickly deployment note.


๐Ÿง  Research: agent routing, proactive problemโ€‘solving, trace fidelity

New papers target where agents fail: responseโ€‘aware routing (Lookahead), distributed selfโ€‘routing (DiSRouter), proactive E2E eval (PROBE), and instructionโ€‘following inside reasoning traces (ReasonIF).

ReasonIF finds frontier LRMs violate reasoningโ€‘time instructions >75% of the time; finetuning helps modestly

Together AIโ€™s ReasonIF benchmark shows models like GPTโ€‘OSSโ€‘120B, Qwen3โ€‘235B, and DeepSeekโ€‘R1 ignore stepโ€‘level directives (formatting, length, multilingual constraints) in >75% of reasoning traces; multiโ€‘turn prompting and a lightweight finetune improve scores but donโ€™t fully fix processโ€‘level compliance paper overview.

Code, paper, and blog are available for replication and training recipes GitHub repo, project blog.

Lookahead routing predicts model outputs to choose the best LLM, averaging +7.7% over SOTA routers

A new routing framework โ€œLookaheadโ€ forecasts latent response representations for each candidate model before routing, yielding a 7.7% average lift across seven benchmarks and working with both causal and masked LMs paper thread, with details in the preprint ArXiv paper.

It improves especially on openโ€‘ended tasks by making responseโ€‘aware decisions instead of inputโ€‘only classification, and reaches full performance with ~16% of training data, cutting router data needs.

PROBE benchmark shows proactive agent limits: only ~40% endโ€‘toโ€‘end success on realโ€‘work scenarios

PROBE (Proactive Resolution of Bottlenecks) tests agents on three stepsโ€”search, identify the root blocker, then execute a precise actionโ€”over long, noisy corpora (emails, docs, calendars); top models reach ~40% endโ€‘toโ€‘end success, with frequent failures on rootโ€‘cause ID and parameterizing the final action paper abstract.

Chained tool frameworks underperform when retrieval misses key evidence, underscoring that proactive help hinges on evidence selection and exact action specification.

DiSRouter: Distributed selfโ€‘routing across smallโ†’large LLMs with subโ€‘5% overhead

DiSRouter removes the central router and lets each model decide to answer or say โ€œI donโ€™t knowโ€ and forward upstream, chaining small to large models for better utility at low cost; authors report <5% routing overhead and robustness when the model pool changes paper abstract.

By training models to selfโ€‘reject via SFT and RL, the system avoids brittle global routers that must be retrained whenever the pool updates.

SmartSwitch curbs โ€˜underthinkingโ€™ by blocking premature strategy switches; QwQโ€‘32B hits 100% on AMC23

SmartSwitch monitors generation for switch cues (e.g., โ€œalternativelyโ€), scores the current thought with a small model, and if still promising, rolls back to deepen that path before allowing a switch; across math tasks it raises accuracy while cutting tokens/time, with QwQโ€‘32B reaching 100% on AMC23 paper abstract.

Unlike โ€œbe thoroughโ€ prompts or fixed penalties, the selective intervention preserves agility while enforcing depth where it matters.

Ensembling multiple LLMs via โ€˜consortium votingโ€™ reduces hallucinations and boosts uncertainty signals

A study ensembles diverse LLMs and groups semantically equivalent answers to take a majority vote, introducing โ€œconsortium entropyโ€ as an uncertainty score; this blackโ€‘box setup often outperforms singleโ€‘model selfโ€‘consistency while costing less than manyโ€‘sample decoding paper abstract.

The result doubles as a triage signal, flagging lowโ€‘confidence cases to humansโ€”useful for production gateways where retraining isnโ€™t feasible. Following up on self-consistency, which offered error guarantees for majority vote, this extends the idea across heterogeneous models rather than multiple samples of one.

Letta Evals snapshots agents for stateful, reproducible evaluation via โ€˜Agent File (.af)โ€™ checkpoints

Letta introduced an evaluation method that checkpoints full agent state and environment into an Agent File (.af) so teams can replay and compare agent behavior holisticallyโ€”not just promptsโ€”over longโ€‘lived, learning agents product note.
This targets a growing gap in agent testing where memory and environment drift make traditional singleโ€‘turn or stateless evals misleading for production readiness.


๐Ÿงช Serving quality: provider exactness and openโ€‘model stabilization

Production notes on improving open models in agents: Clineโ€™s GLMโ€‘4.6 prompt slimming and provider filtering (:exacto) lift toolโ€‘call reliability; OpenRouter confirms :exacto gains; Baseten adds fast GLMโ€‘4.6 hosting.

Cline stabilizes GLMโ€‘4.6 agents with 57% prompt cut and :exacto provider routing

Cline reports a production hardening of open models by shrinking GLMโ€‘4.6โ€™s system prompt from 56,499 to 24,111 characters (โˆ’57%), which sped responses, lowered cost, and reduced toolโ€‘call failures; they also now autoโ€‘select OpenRouterโ€™s โ€œ:exactoโ€ endpoints to avoid silently degraded hosts that broke tool calls. See details and the before/after instruction tuning in Cline blog, a sideโ€‘byโ€‘side run where glmโ€‘4.6:exacto succeeds while a standard endpoint fails by emitting calls in thinking tags in provider demo, and OpenRouterโ€™s confirmation that Clineโ€™s quality jump came from :exacto in OpenRouter note.

SGLang Model Gateway v0.2 adds cacheโ€‘aware multiโ€‘model routing and productionโ€‘grade reliability

LMSYS rebuilt SGLโ€‘Router into the SGLang Model Gateway: a Rust gRPC, OpenAIโ€‘compatible front door that runs fleets of models under one gateway with policyโ€‘based routing, prefill/decode disaggregation, cached tokenization, retries, circuit breakers, rate limiting, and Prometheus metrics/tracing. It targets agent backends where endpoint quality varies and failover, observability, and tool/MCP integration are mandatory gateway release, with a feature list of reliability/observability upgrades for production workloads reliability brief.

Baseten lights up GLMโ€‘4.6 hosting with usageโ€‘billed API and fastest thirdโ€‘party claim

Baseten announced GLMโ€‘4.6 availability via its managed inference with API pricing for teams that prefer usage billing, and reiterated itโ€™s the fastest thirdโ€‘party host for this model per recent bakeโ€‘offs. For teams standardizing on open models across providers, this adds a turnkey endpoint option alongside selfโ€‘hosted stacks hosting note.

Factory CLIโ€™s mixedโ€‘model planโ†’execute keeps 93% quality at lower cost

Factory advocates splitting agent work across modelsโ€”use a strong, pricier model (e.g., Sonnet) to plan and a cheaper open model (e.g., GLM) to executeโ€”claiming you keep ~93% of performance while โ€œonly paying premium for thinking.โ€ This is a practical pattern for taming provider variance and stabilizing tool calls without locking into a single endpoint claims thread, with broader mixedโ€‘model support landing in the Factory CLI mixed models note.


๐Ÿ›ก๏ธ Trust & uptime: data deletion policy and outage recap

Operational signals: OpenAI confirms return to 30โ€‘day deletion for ChatGPT/API after litigation hold ended; separate brief outage caused โ€˜Too many concurrent requestsโ€™ with status updates to recovery.

OpenAI reinstates 30โ€‘day deletion for ChatGPT and API after litigation hold ends

OpenAI says deleted and temporary ChatGPT chats will again be autoโ€‘deleted within 30 days, and API data will also be deleted after 30 days, following the end of a litigation hold on September 26, 2025 policy screenshot.

Teams should verify retention assumptions in privacy notices, DSR workflows, and logging/backup pipelines; OpenAI notes it will keep a tightlyโ€‘accessโ€‘controlled slice of historical user data from Aprilโ€“September 2025 for legal/security reasons only policy screenshot. Community commentary stresses this mirrors prior standard practice and that the earlier hold stemmed from external litigation constraints, not a product policy change context thread.

ChatGPT outage triggers โ€˜Too many concurrent requestsโ€™; status page shows sameโ€‘day recovery

ChatGPT briefly returned โ€œToo many concurrent requestsโ€ errors; OpenAIโ€™s status page tracked investigation, mitigation, and full recovery within the same afternoon error screenshot, OpenAI status.

According to the incident log, errors began midโ€‘afternoon, a mitigation was applied within about an hour, and all impacted services recovered shortly thereafter OpenAI status. Users and thirdโ€‘party monitors reported elevated error rates during the window, aligning with OpenAIโ€™s outage acknowledgment and remediation updates outage report.


๐Ÿ•น๏ธ World models in the browser: Genie 3 experiment

Googleโ€™s Genie 3 public experiment appears imminent: UI for sketchโ€‘yourโ€‘world and character prompts surfaces, with reporting that users will generate and explore interactive worlds. Separate from LTXโ€‘2 video engine.

Genie 3 public experiment UI surfaces; โ€˜create worldโ€™ flow suggests launch soon

Googleโ€™s Genie 3 appears ready for a public browser experiment: a โ€œCreate worldโ€ interface with Environment and Character prompts, plus a Firstโ€‘person toggle, has been spotted alongside reports that users will generate and then explore interactive worlds. Multiple screenshots and writeโ€‘ups point to an imminent rollout rather than a labโ€‘only demo documented scoop, and community observers are now calling the release all but confirmed confirmation post.

The new UI invites text descriptions of the world and avatar and hints at sketchโ€‘toโ€‘world creation, aligning with Googleโ€™s earlier โ€œworld modelโ€ framing. For analysts and engineers, this signals handsโ€‘on data about userโ€‘steered simulation, control inputs, and firstโ€‘person interaction loopsโ€”key to agent training and evaluation in browserโ€‘safe sandboxes ui preview. Full details and artifact references are compiled in TestingCatalogโ€™s coverage TestingCatalog article, with additional UI capture corroborating the same flow ui screenshot.


๐Ÿ“Š Agent evals & observability: multiโ€‘turn and automated insights

Evals tooling advanced: LangSmith adds an Insights Agent and multiโ€‘turn evals for goal completion; Letta ships stateful agent evals using Agent File snapshots to replicate full state+env. Practical, productionโ€‘oriented.

LangSmith adds Insights Agent and multiโ€‘turn conversation evals

LangChain rolled out two eval features in LangSmith: an Insights Agent that automatically categorizes agent behavior patterns, and Multiโ€‘turn Evals that score entire conversations against user goals rather than single turns feature brief. This closes a common gap in production agent QA by shifting from turnโ€‘level rubric checks to trajectoryโ€‘level success measurement across tasks like planning, tool use, and error recovery.

ReasonIF finds LRMs ignore reasoningโ€‘time instructions >75% of the time

Together AIโ€™s ReasonIF study shows frontier large reasoning models often fail to follow instructions during the chainโ€‘ofโ€‘thought itselfโ€”over 75% nonโ€‘compliance across multilingual reasoning, formatting, and length controlโ€”despite solving ability paper summary. Authors release a benchmark plus code and data; simple interventions like multiโ€‘turn prompting and instructionโ€‘aware finetuning partially improve adherence resources bundle, ArXiv paper, and GitHub repo.

For evaluators, this clarifies why outputโ€‘only checks miss latent failures: processโ€‘level audits and instructionโ€‘fidelity metrics belong alongside accuracy.

Letta Evals debuts stateful agent testing via Agent File snapshots

Letta introduced an eval suite purposeโ€‘built for longโ€‘lived agents, snapshotting full agent state and environment into an Agent File (.af) so tests can deterministically replay behavior, compare changes, and evaluate upgrades applesโ€‘toโ€‘apples product note, launch claim. Teams can evaluate an entire agent (not just prompts) and even target existing agents as eval fixtures, addressing the core challenge of drift in memoryful, toolโ€‘rich systems.

New PROBE benchmark stresses proactive agents; top models ~40% endโ€‘toโ€‘end

A new dataset, PROBE (Proactive Resolution of Bottlenecks), evaluates agent workflows that must search long noisy corpora, identify a single true blocker, and execute one precise action with parameters. Leading models manage roughly 40% endโ€‘toโ€‘end success, with most failures in rootโ€‘cause identification and incomplete action arguments paper thread.

This style of eval mirrors real knowledgeโ€‘work: find the right evidence, disambiguate ownership/deadlines, and act onceโ€”useful for assessing enterprise agent readiness beyond chat quality.

Multiโ€‘model โ€˜consortium votingโ€™ cuts hallucinations and adds calibrated uncertainty

A paper from Cambridge Consultants and collaborators proposes teaming multiple LLMs, grouping semantically equivalent answers, and majorityโ€‘voting to both reduce hallucinations and expose confidence via consortium entropyโ€”often beating singleโ€‘model selfโ€‘consistency at lower cost paper details. In context of certified majorityโ€‘vote methods with error guarantees reported yesterday error guarantees, this offers a pragmatic, blackโ€‘box route to production risk flags without retraining.

The approach also provides a cheap abstain signal for eval pipelines: throttle or escalate when answer clusters disperse.

On this page

Executive Summary
Feature Spotlight: Anthropic ร— Google secure ~1M TPUs, >1 GW by 2026
โšก Feature: Anthropic ร— Google secure ~1M TPUs, >1 GW by 2026
Anthropic locks up ~1M Google TPUs and >1 GW for 2026 in a deal worth tens of billions
Anthropic demand picture: 300k+ business customers, large accounts up ~7ร— YoY, revenue near $7B
Ironwood, Googleโ€™s 7thโ€‘gen TPU for highโ€‘throughput inference, is central to Anthropicโ€™s plan
Despite the TPU megadeal, Anthropic reiterates a multiโ€‘cloud, multiโ€‘silicon strategy
๐Ÿ–ฅ๏ธ OpenAI buys Sky: screen-aware Mac actions
OpenAI buys Sky to add screenโ€‘aware Mac actions to ChatGPT
Signal in the noise: Sky shows OpenAIโ€™s platform push into OSโ€‘level agents
Workflow/Shortcuts alumni behind Sky bring deep macOS automation chops to OpenAI
OpenAI positions Sky as a shift from chat to actionโ€”and discloses Altmanโ€‘linked passive investment
What a screenโ€‘aware Mac agent unlocks for developers and IT
๐ŸŽฌ Cinematic AI video goes open: LTXโ€‘2 arrives
LTXโ€‘2 debuts with native 4K, up to 50 fps, and synced audio; open weights coming later this year
fal ships dayโ€‘0 LTXโ€‘2 APIs (Fast/Pro) for textโ†’video and imageโ†’video up to 4K with perโ€‘second pricing
Replicate lists LTXโ€‘2 Fast and Pro with prompt guidelines and example workflows
Practitioners call LTXโ€‘2 a new bar; native 4K motion and texture beat upscaled outputs
๐Ÿงญ Agentic browsers: Edge Copilot Mode and fall updates
Edge adds Copilot Mode with Actions, autonomy levels, and optโ€‘in Page Context
Copilot Sessions Fall update brings Groups, Mico/Clippy, and crossโ€‘app memory
โœ๏ธ Ship faster in AI Studio: Annotate & vibe coding
Google AI Studio adds Annotate Mode for pointโ€‘andโ€‘edit coding
Vibe coding in AI Studio: NL intents to runnable apps with Search grounding
AI Studio traffic jumps 64% in September, topping ~160M monthly visits
๐Ÿš€ Agent infra for builders: Vercel Agent, WDK, Marketplace
Vercel Agent launches in public beta with AI code review and incident investigations
Workflow Development Kit makes reliability โ€œjust codeโ€ with durable, resumable steps
Vercel Marketplace debuts with agent apps and AI infrastructure services, unified billing
AI SDK 6 (beta) unifies agent abstraction with humanโ€‘inโ€‘theโ€‘loop tool approvals and image editing
Zeroโ€‘config backends on Vercel AI Cloud bring frameworkโ€‘defined infra and unified observability
๐Ÿงฉ Enterprise collaboration & context: projects, knowledge, memory
Company Knowledge arrives for Business, Enterprise, and Edu with GPTโ€‘5 search across Slack/SharePoint/Drive/GitHub and citations
OpenAI rolls out Shared Projects to Free, Plus, and Pro with tier caps and project-only memory
Anthropic ships projectโ€‘scoped Memory to Max and starts Pro rollout with incognito chats and safety guardrails
๐Ÿ“„ Document AI momentum: LightOnOCRโ€‘1B and tooling
LightOnOCRโ€‘1B debuts: fast, endโ€‘toโ€‘end OCR with SOTA-class quality; training data release teased
Baseten explains DeepSeekโ€‘OCRโ€™s โ€œoptical compressionโ€ and ships a 10โ€‘minute deploy path
Hugging Face updates open OCR model comparison with Chandra, OlmOCRโ€‘2, Qwen3โ€‘VL and averaged scores
vLLM flags surge of small, fast OCR models landing for production serving
Hugging Face promotes fewโ€‘click deployment for the latest OCR models
๐Ÿง  Research: agent routing, proactive problemโ€‘solving, trace fidelity
ReasonIF finds frontier LRMs violate reasoningโ€‘time instructions >75% of the time; finetuning helps modestly
Lookahead routing predicts model outputs to choose the best LLM, averaging +7.7% over SOTA routers
PROBE benchmark shows proactive agent limits: only ~40% endโ€‘toโ€‘end success on realโ€‘work scenarios
DiSRouter: Distributed selfโ€‘routing across smallโ†’large LLMs with subโ€‘5% overhead
SmartSwitch curbs โ€˜underthinkingโ€™ by blocking premature strategy switches; QwQโ€‘32B hits 100% on AMC23
Ensembling multiple LLMs via โ€˜consortium votingโ€™ reduces hallucinations and boosts uncertainty signals
Letta Evals snapshots agents for stateful, reproducible evaluation via โ€˜Agent File (.af)โ€™ checkpoints
๐Ÿงช Serving quality: provider exactness and openโ€‘model stabilization
Cline stabilizes GLMโ€‘4.6 agents with 57% prompt cut and :exacto provider routing
SGLang Model Gateway v0.2 adds cacheโ€‘aware multiโ€‘model routing and productionโ€‘grade reliability
Baseten lights up GLMโ€‘4.6 hosting with usageโ€‘billed API and fastest thirdโ€‘party claim
Factory CLIโ€™s mixedโ€‘model planโ†’execute keeps 93% quality at lower cost
๐Ÿ›ก๏ธ Trust & uptime: data deletion policy and outage recap
OpenAI reinstates 30โ€‘day deletion for ChatGPT and API after litigation hold ends
ChatGPT outage triggers โ€˜Too many concurrent requestsโ€™; status page shows sameโ€‘day recovery
๐Ÿ•น๏ธ World models in the browser: Genie 3 experiment
Genie 3 public experiment UI surfaces; โ€˜create worldโ€™ flow suggests launch soon
๐Ÿ“Š Agent evals & observability: multiโ€‘turn and automated insights
LangSmith adds Insights Agent and multiโ€‘turn conversation evals
ReasonIF finds LRMs ignore reasoningโ€‘time instructions >75% of the time
Letta Evals debuts stateful agent testing via Agent File snapshots
New PROBE benchmark stresses proactive agents; top models ~40% endโ€‘toโ€‘end
Multiโ€‘model โ€˜consortium votingโ€™ cuts hallucinations and adds calibrated uncertainty