OpenAI GPT‑5.2 beats 74% of experts on GDPval – 11× cheaper work feature image for Thu, Dec 11, 2025

OpenAI GPT‑5.2 beats 74% of experts on GDPval – 11× cheaper work

Stay in the loop

Free daily newsletter & Telegram daily report

Join Telegram Channel

Executive Summary

OpenAI finally answered the Devstral and Gemini noise with a workhorse: GPT‑5.2 is live across ChatGPT and the API, and on GDPval it now beats or ties human experts on roughly 71–74% of real knowledge‑work tasks. Those tasks usually take 4–8 hours and $150–$200 of billable time; OpenAI claims 5.2 does them over 11× faster at under 1% of the expert cost. This is the first OpenAI release that feels explicitly aimed at replacing mid‑career desk work, not toy prompts.

Product‑wise, you get three tiers: Instant for everyday chat, Thinking for serious reasoning, and Pro as the slow, heavy hitter. Standard 5.2 runs at $1.75 / $14 per 1M input / output tokens with a 400k context window; Pro jumps to $21 / $168 and lives only behind the Responses API for multi‑minute, high‑effort traces. Benchmarks back the positioning: 55.6% on SWE‑Bench Pro, 92.4% on GPQA Diamond, 100% on AIME 2025, and ARC Prize‑verified 90.5% on ARC‑AGI‑1 with a 390× cost‑efficiency gain over last year’s o3 preview.

The System Card is the other big story: hallucinations drop about 30–40%, deceptive tool use falls from 7.7% to 1.6%, and long‑context retrieval stays near‑perfect out past 128k tokens. It’s clearly a better brain for agents and coders, but not magic—you still need routing, evals, and human checks where mistakes are expensive.

Top links today

Feature Spotlight

Feature: OpenAI ships GPT‑5.2 for work and agents

OpenAI releases GPT‑5.2 across ChatGPT and API with expert‑level GDPval, major long‑context and coding gains, and clear pricing tiers—including a costly Pro. Sets the competitive bar for enterprise tasks and agent workflows.

Cross‑account launch dominates today: GPT‑5.2 (Instant, Thinking, Pro) lands in ChatGPT and API with big gains on real‑world knowledge work, coding, and long‑context; pricing and system card details included. Mostly model/eval posts; separate sections exclude this launch.

Jump to Feature: OpenAI ships GPT‑5.2 for work and agents topics

Table of Contents

🧄 Feature: OpenAI ships GPT‑5.2 for work and agents


🕸️ Google’s Interactions API and Deep Research agent


📈 Frontier eval race: third‑party verifications and ladders


🛠️ Coding agents and IDEs: design‑in‑code and 5.2 wiring


💼 Enterprise & deals: Disney x OpenAI and platform adoption


🛡️ Safety, robustness and policy moves


🎬 Generative media and vision: video tools and pipelines


🏗️ AI infra & networking economics


📚 Fresh research: agent scaling laws, diffusion LLMs, code‑from‑papers


🗣️ Voice ecosystems: platform reach and cost calculus

On this page

Executive Summary
Feature Spotlight: Feature: OpenAI ships GPT‑5.2 for work and agents
🧄 Feature: OpenAI ships GPT‑5.2 for work and agents
GPT-5.2 hits human-expert level on GDPval knowledge work benchmark
OpenAI launches GPT-5.2 Instant, Thinking, and Pro for ChatGPT and API
ARC Prize verifies GPT-5.2 Pro as new ARC-AGI SOTA with 390× efficiency gain
GPT-5.2 pricing, context window, and Pro tier economics
GPT-5.2 Thinking tops most OpenAI, Anthropic, and Google benchmarks
Builders report big gains from GPT-5.2 for coding and agents, with caveats
GPT-5.2 sharply improves long-context retrieval on MRCRv2
System Card: GPT-5.2 cuts hallucinations and deceptive tool use
🕸️ Google’s Interactions API and Deep Research agent
Gemini Deep Research hits SOTA on HLE, DeepSearchQA and BrowseComp
Google ships Interactions API with Gemini Deep Research and MCP tools
Builders begin adopting Interactions API for long-horizon Gemini agents
Google tests Disco, a Gemini agent that turns your tabs into task apps
📈 Frontier eval race: third‑party verifications and ladders
Context Arena’s MRCR shows GPT‑5.2’s long‑context wins come from heavy reasoning effort
Vals Index crowns GPT‑5.2 #1 but notes ~4× higher query cost
GPT‑5.2‑high debuts #2 on Code Arena WebDev leaderboard
LisanBench and community evals show GPT‑5.2 Thinking better but not best at reasoning
Misc community benches (VPCT, LiveBench, SWE‑Bench Verified) show mixed GPT‑5.2 picture
Opus 4.5 holds Terminal‑Bench lead as Warp pushes GPT‑5.2 to 61.1%
Extended NYT Connections benchmark shows clear GPT‑5.2 gains, Gemini still ahead
🛠️ Coding agents and IDEs: design‑in‑code and 5.2 wiring
CopilotKit v1.50 introduces `useAgent()` hook and LangGraph/Mastra adapters
Factory’s Droid adopts GPT‑5.2 for architecture, data and sysadmin tasks
Kilo Code tunes its agents around GPT‑5.2 for UI and bug‑fix work
RepoPrompt 5.2 adds GPT‑5.2 and background jobs for Pro‑length tasks
Zed editor turns on GPT‑5.2 for Pro and BYOK users
Conductor wires GPT‑5.2 into its agentic orchestration for coding flows
LlamaIndex ships `ask` CLI and LlamaSheets for table‑centric agents
MagicPathAI ships GPT‑5.2 Thinking for UI layout and data‑viz design
Rork switches to GPT‑5.2 for long‑context UI and frontend work
Julius AI exposes GPT‑5.2 for spreadsheet‑heavy data analysis
💼 Enterprise & deals: Disney x OpenAI and platform adoption
Disney puts $1B into OpenAI and signs three‑year Sora content deal
Box AI’s internal evals push GPT‑5.2 into production
Disney’s cease‑and‑desist to Google underscores shifting AI content alliances
OpenAI teams with Rappi to push ChatGPT Go across Latin America
Windsurf and Devin move core workloads to GPT‑5.2
GPT‑5.2 Pro debuts near the top of OpenRouter’s price table
Notion’s ‘olive‑oil‑cake’ hook hints at GPT‑5.2 under the hood
Perplexity turns GPT‑5.2 into a first‑class Pro/Max model option
Research and coding SaaS tools race to wire in GPT‑5.2
🛡️ Safety, robustness and policy moves
GPT-5.2 System Card shows big drops in deception and hallucinations
GPT‑5.2 jailbreaks resurface, and OpenAI starts emailing enforcement warnings
OpenAI plans ChatGPT “adult mode” with age‑prediction‑based gating in 2026
Nvidia prepares on‑device location verification to curb AI GPU smuggling
US executive order aims for a single national AI framework instead of 50 state laws
🎬 Generative media and vision: video tools and pipelines
Runway Gen‑4.5 goes live as a physics‑aware “world engine” for video
Fal hosts Creatify Aurora for single‑image talking avatars with rich motion
Fal ships Wan Vision Enhancer and Flux upscaler for smarter video and image upgrades
Gemini 3 Pro outperforms GPT‑5.2 on detailed motherboard understanding
Google Labs’ Pomelli adds Animate, using Veo 3.1 to turn static designs into motion
OmniPSD generates fully layered PSDs directly from prompts with a diffusion transformer
StereoWorld turns monocular videos into geometry‑aware stereo for 3D viewing
Invideo debuts AI film tool that stylizes footage while preserving actor performance
🏗️ AI infra & networking economics
Broadcom CEO reveals $73B AI data‑center networking backlog
Nvidia plans GPU location‑verification to curb export‑control evasion
📚 Fresh research: agent scaling laws, diffusion LLMs, code‑from‑papers
Scaling laws for multi‑agent systems show small average gains, big variance
d3LLM uses new AUP metric to push diffusion LLMs up to 10× faster
DeepCode agent rebuilds code from papers and edges out PhD baselines
Co‑evolution of swarm algorithms and prompts boosts LLM‑designed solvers
PathHD shows single‑call QA over knowledge graphs with hypervectors
🗣️ Voice ecosystems: platform reach and cost calculus
ElevenLabs audio now powers Instagram, Horizon and more
Gemini Live two‑way voice runs around 1–2 cents per minute
Gemini speech models show flexible singing and playful audio