.png&w=3840&q=75)
OpenAI GPT‑5.2 beats 74% of experts on GDPval – 11× cheaper work
Stay in the loop
Free daily newsletter & Telegram daily report
Executive Summary
OpenAI finally answered the Devstral and Gemini noise with a workhorse: GPT‑5.2 is live across ChatGPT and the API, and on GDPval it now beats or ties human experts on roughly 71–74% of real knowledge‑work tasks. Those tasks usually take 4–8 hours and $150–$200 of billable time; OpenAI claims 5.2 does them over 11× faster at under 1% of the expert cost. This is the first OpenAI release that feels explicitly aimed at replacing mid‑career desk work, not toy prompts.
Product‑wise, you get three tiers: Instant for everyday chat, Thinking for serious reasoning, and Pro as the slow, heavy hitter. Standard 5.2 runs at $1.75 / $14 per 1M input / output tokens with a 400k context window; Pro jumps to $21 / $168 and lives only behind the Responses API for multi‑minute, high‑effort traces. Benchmarks back the positioning: 55.6% on SWE‑Bench Pro, 92.4% on GPQA Diamond, 100% on AIME 2025, and ARC Prize‑verified 90.5% on ARC‑AGI‑1 with a 390× cost‑efficiency gain over last year’s o3 preview.
The System Card is the other big story: hallucinations drop about 30–40%, deceptive tool use falls from 7.7% to 1.6%, and long‑context retrieval stays near‑perfect out past 128k tokens. It’s clearly a better brain for agents and coders, but not magic—you still need routing, evals, and human checks where mistakes are expensive.
Top links today
- OpenAI GPT-5.2 introduction and docs
- OpenAI GPT-5.2 benchmark results
- GPT-5.2 safety and system card
- GDPval knowledge work benchmark description
- ARC Prize leaderboard with GPT-5.2 scores
- Gemini Interactions API developer overview
- Gemini Deep Research agent documentation
- Scaling laws for multi-agent AI systems
- AutoGLM mobile agent GitHub repo
- Ultra-FineWeb-en-v1.4 2.2T token dataset
- SentenceTransformers v5.2.0 release notes
- Runway Gen-4.5 and GWM-1 world model
- DeepCode open agentic coding paper
- AUP study on diffusion language models
- DeepSearchQA web research benchmark release
Feature Spotlight
Feature: OpenAI ships GPT‑5.2 for work and agents
OpenAI releases GPT‑5.2 across ChatGPT and API with expert‑level GDPval, major long‑context and coding gains, and clear pricing tiers—including a costly Pro. Sets the competitive bar for enterprise tasks and agent workflows.
Cross‑account launch dominates today: GPT‑5.2 (Instant, Thinking, Pro) lands in ChatGPT and API with big gains on real‑world knowledge work, coding, and long‑context; pricing and system card details included. Mostly model/eval posts; separate sections exclude this launch.
Jump to Feature: OpenAI ships GPT‑5.2 for work and agents topics