OpenAI charts $20B+ ARR and ~1.9GW compute – claims cents per 1M tokens

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

OpenAI published a “business that scales with intelligence” memo tying revenue to infrastructure; charts show ARR moving $2B (2023) → $6B (2024) → $20B+ (2025E) while compute rises 0.2GW → 0.6GW → ~1.9GW; the post frames earlier reliance on a single compute provider as a hard ceiling, then pitches a multi-provider portfolio split between premium training and cheaper serving. It also floats serving costs trending toward “cents per 1M tokens,” but there’s no third-party verification or SKU-level disclosure.

• Memory wall: threads argue inference is now bandwidth/capacity-limited; model size ~19×/2 years vs memory per accelerator ~1.9×; over ~20 years, compute ~60,000× vs DRAM bandwidth ~100×.
• Prompt caching paper: reports ~45–80% cost cuts and ~13–31% faster TTFT by caching stable prefixes; warns full-chat caching can misfire when tool outputs vary.

The throughline is economics-first agent scaling: cheaper serving, tighter memory hierarchies, and more explicit “what persists” layers (KBs, caches) becoming the real product differentiators.

Claude Cowork adds persistent “Knowledge Bases” and a new customization layer

Anthropic is building a persistent memory primitive (“Knowledge Bases”) for Claude Cowork, plus a broader “Customize” surface (Skills/Connectors/Commands) and voice support. This shifts Claude from per-chat context to an auto-maintained, topic-scoped memory system that should reduce re-explaining and improve long-running coworker-style workflows.

Multiple leaks point to Claude Cowork gaining auto-managed, topic-specific persistent memory (“Knowledge Bases”), alongside an upcoming “Customize” area for Skills/Connectors/Commands and voice mode, with Cowork and Chat converging into a single default UI.

Jump to Claude Cowork adds persistent “Knowledge Bases” and a new customization layer topics

🧠 Claude Cowork adds persistent “Knowledge Bases” and a new customization layer

Claude Cowork may add Knowledge Bases as persistent, auto-updating memory

Claude Cowork (Anthropic): Leaked UI/internal instructions suggest Anthropic is building Knowledge Bases—persistent, topic-scoped repositories that Claude should proactively consult and incrementally update with “preferences, decisions, facts, lessons learned,” as described in the KB leak thread and expanded in the Feature leak article. This matters because it turns “memory” into an explicit, maintainable artifact instead of relying on ever-growing chat logs.

The open question is how this will be exposed for control/audit (history, diffs, export) vs being a mostly automatic layer.

OpenAI charts $20B+ ARR and ~1.9GW compute – claims cents per 1M tokens

Executive Summary

Top links today

Claude Cowork adds persistent “Knowledge Bases” and a new customization layer

Table of Contents

🧠 Claude Cowork adds persistent “Knowledge Bases” and a new customization layer

Claude Cowork may add Knowledge Bases as persistent, auto-updating memory

Claude Desktop may merge Cowork and Chat into one default interface

Claude Code may add customizable Commands inside a new Customize hub

Claude Desktop appears to be adding voice mode for Cowork

Claude Cowork is now available to Pro and Max users

Windows users keep asking for Claude Cowork support

⚡ Codex speed war: OpenAI × Cerebras tok/s claims and skepticism

OpenAI×Cerebras tok/s claims put “very fast Codex” into an agent UX frame

Skepticism grows around what “2,000 tok/s Codex” would actually mean in practice

🌐 Browser automation tooling: agent-browser v0.6.0 and web-agent workflows

agent-browser v0.6.0 adds built-in video recording for automation runs

agent-browser v0.6.0 adds persistent CDP sessions via connect command

agent-browser v0.6.0 adds computed styles extraction for more reliable UI automation

agent-browser v0.6.0 adds proxy support and richer network request visibility

HyperPages open-sources a web research page builder that assembles cited articles

🧭 Workflow patterns: AGENTS.md hygiene, control-flow discipline, and context scoping

AGENTS.md cleanup: refactor into a minimal root file plus linked subdocs

Stop using prompts as control flow; use structured output and real branching

Task-Decoupled Planning: isolate subtask context to prevent plan entanglement

“Model selector is a lie”: harness prompts and tools co-evolve with each model

Encode conventions as lint rules, not prompt text, to keep agents consistent

🧩 Skills ecosystem: portable skill packs, installers, and repo conventions

OpenSkills v1.5.0 pushes skills packaging toward a CLI “installer” norm

Google Antigravity signals first-class support for agent skills

Tailwind v4 needs explicit pinning in agent runs to avoid accidental v3 installs

The .claude/skills folder is emerging as a cross-tool pickup point

A “create new static website” skill packages Astro + Tailwind v4 setup

A reusable “Redesign my landingpage” skill lands in agent-skills

Demand grows for a “one hub” UI that can orchestrate multiple paid model clients

🗂️ Agent ops & local-first tooling: Beads ports, multi-harness runners, and dashboards

beads_rust freezes “classic Beads” as a fast Rust port

ntm orchestration demo runs 10 Claude Codes plus 5 Codex instances

Beads “bd Issues” pitch: stop using Markdown TODOs for agent work queues

dotagents centralizes harness setups under ~/.agents

Amp adds “activity bubbles” usage view in settings

Beads daemon embeds SQLite as WebAssembly via wazero

peky 0.0.34 adds a repo+agent status topbar

🧱 Cursor and long-run agent builds: 3M+ LOC browser follow-ups

Cursor’s browser project reportedly compiles now, with build steps added

Cursor’s 3M+ LOC browser demo becomes a shorthand for agentic scale

The compiled browser still shows obvious rendering bugs in real pages

Agent codegen needs buildability and correctness metrics, not LOC

🏗️ Infra signals: revenue/compute scaling and the memory wall

OpenAI reports $20B+ ARR alongside ~3×/year compute growth to ~1.9GW

Hardware “memory wall” framing: bandwidth/capacity lag is now the inference bottleneck

Demis Hassabis: Chinese frontier models are “months” behind; US moat “melting”

AMD’s Lisa Su: AI compute could rise 100× in 4–5 years; users 1B→5B

🔌 MCP and connector surface area: personal data, messages, and tool access

iMCP turns macOS personal data apps into an MCP server for agents

Using iMessage threads as agent context via MCP

iMCP PR adds Claude Code and Amp install instructions

📏 Benchmarks & eval signals: GPT-5.2 math proofs and “independent” solutions

GPT-5.2 Pro finds an independent-style proof for an Erdős problem

Erdős proof follow-up shifts focus to novelty and scaffolding

Debate grows over what “different proofs” should count as

📱 Edge + on-device AI: Google AI Edge, LiteRT in-browser inference, and Apple Foundation Models

LiteRT demo shows offline, in-browser LLM inference without local model servers

Apple’s Foundation Models show up in Xcode workflows for on-device tool calling

Google AI Edge pushes a cross-platform stack for on-device models

AI Edge Gallery highlights device gating as a product risk for on-device AI

📚 Research papers worth stealing from: planning, judging, memory, and cost control

Agent-as-a-Judge reframes evaluation as an agentic workflow, not a single score

Prompt caching study quantifies big cost cuts for long-horizon agents

Task-Decoupled Planning reframes long-horizon failures as context entanglement

Dr. Zero proposes data-free self-evolution for search agents

ExpSeek uses entropy to ask for help only when stuck in web navigation

JudgeFlow uses block-level blame to optimize agent workflows faster

Prompt politeness study reports a small but consistent accuracy swing

Survey maps agent memory from simple storage to experience abstraction

🛡️ Security and trust: cheap impersonation and reply spam degrade platforms

Face swap + voice cloning under ~200ms makes impersonation scalable

VoxCPM open-source voice cloning claims 5-second samples

AI reply spam is collapsing the value of public replies

🎙️ Voice agents: “serious mode” demand and integration pain