
GPT‑5.2‑Codex hits 56.4% SWE‑Bench Pro – gated cyber access
Stay in the loop
Free daily newsletter & Telegram daily report
Executive Summary
OpenAI’s week is about agents, not chat: GPT‑5.2‑Codex is now the default coding brain in Codex and paid ChatGPT, tuned for long‑horizon refactors, tighter Windows/tool use, and context compaction so multi‑hour sessions don’t blow your token budget. On OpenAI’s own numbers it edges GPT‑5.2 and 5.1‑Codex‑Max on real‑world suites, hitting 56.4% on SWE‑Bench Pro and 64.0% on Terminal‑Bench 2.0, which map much closer to “did this actually fix the repo?” than to toy LeetCode.
The twist is cyber. 5.2‑Codex sits at the top of their internal professional CTF evals with pass@12 clustered around 85–90%, so OpenAI is turning it on inside Codex but slow‑rolling API access and spinning up an invite‑only “trusted access” track for defensive teams. That follows a recent CVE‑2025‑55183 React exploit that a researcher co‑developed with 5.1‑Codex‑Max, and it’s clear they don’t want that workflow in every random pentest bot.
Codex CLI 0.74 makes 5.2‑Codex the default with per‑run reasoning effort knobs and a configurable sandbox; devs report “medium” effort covering ~85% of work and x‑high rescuing gnarly bugs Opus 4.5 couldn’t crack in an hour. But 5.2‑Codex underperforms 5.1‑Codex‑Max on MLE‑Bench‑30, and repo hints of a “caribou” flagship suggest a 5.2‑Codex‑Max tier is already brewing. Treat this as your new baseline model—then keep workload‑specific evals and routing in the loop.
Top links today
- OpenAI GPT-5.2-Codex launch and system card
- Agent Skills open standard overview
- Agent Skills open standard GitHub repository
- Gemini 3 Flash in Google AI Studio
- Mistral OCR 3 technical blog and API
- LlamaParse v2 document parsing release blog
- Audio MultiChallenge benchmark paper for speech agents
- Audio MultiChallenge spoken dialogue leaderboard
- Arena-Rank paired-comparison ranking library repo
- SGLang Ollama-compatible API quickstart guide
- OpenRouter JSON response healing feature blog
- vLLM wide expert-parallel MoE benchmark
- Braintrust Brainstore AI observability database
- Exa People Search evals GitHub repository
- Google Agent Development Kit for TypeScript agents
Feature Spotlight
Feature: GPT‑5.2‑Codex lands for agentic coding and cyber
OpenAI ships GPT‑5.2‑Codex in Codex: SOTA agentic coding with 56.4% SWE‑Bench Pro and 64.0% Terminal‑Bench 2.0, context compaction for long runs, stronger Windows/vision, and an invite‑only cyber program.
Cross‑account launch with docs, CLIs and early dev reports. Focus is long‑horizon coding (context compaction), better Windows/tool use, vision‑aided code reading, and a new trusted‑access path for defensive cybersecurity.
Jump to Feature: GPT‑5.2‑Codex lands for agentic coding and cyber topics