DeepSeek V3.2 scores 96% AIME at 5× cheaper tokens – MIT weights target 128K contexts

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

DeepSeek shipped V3.2 and its heavier V3.2‑Speciale as open‑weight frontier reasoners, and the numbers explain why everyone’s rerunning their harnesses. Community pricing charts put V3.2 around $0.28M in / $0.42M out per 1M tokens—roughly 5× cheaper than GPT‑5 High and ~30× cheaper than Gemini 3 Pro on similar workloads, according to early comparisons—while Speciale posts 96.0% on AIME 2025 and 99.2% on HMMT with an MIT license in tow.

Under the hood, the tech report is unusually transparent. DeepSeek Sparse Attention leans on a two‑stage warm‑start (2.1B dense tokens, then ~943.7B sparse) so 128K context behaves like near‑linear cost instead of quadratic, making “full repo in context” agents less of a financial dare. On reasoning and coding, Speciale edges Gemini 3 Pro on some Olympiad and Codeforces setups, hits 84.5% on IMOAnswerBench, and lands in the low‑70s on SWE‑bench Verified and ~80% on τ²‑Bench—squarely in frontier territory.

The tradeoff: Speciale often emits 1.5–2× more tokens than GPT‑5‑class peers and decodes at ~30 tok/s, so 45k‑token chains can mean multi‑minute waits. Builders are already framing it as a batch solver for brutal math and agent traces, with V3.2 as the cheaper, self‑hostable daily driver wired into stacks like SGLang, OpenRouter, and Cline.

Feature: DeepSeek V3.2/Speciale — open-weight frontier reasoning

DeepSeek ships V3.2 (open weights) and V3.2‑Speciale (API‑only) with GPT‑5/Gen‑3‑Pro‑tier reasoning, gold‑level Olympiad results, and a tech report detailing DSA near‑linear attention and heavy RL (>10% pretrain).

Cross‑account story: multiple posts share specs, paper, pricing and evals for DeepSeek‑V3.2 and V3.2‑Speciale. Mostly model/tech details and benchmark tables; Speciale is API‑only for now.

Jump to Feature: DeepSeek V3.2/Speciale — open-weight frontier reasoning topics

🐳 Feature: DeepSeek V3.2/Speciale — open-weight frontier reasoning

Cross‑account story: multiple posts share specs, paper, pricing and evals for DeepSeek‑V3.2 and V3.2‑Speciale. Mostly model/tech details and benchmark tables; Speciale is API‑only for now.

DeepSeek launches V3.2 and V3.2‑Speciale as open‑weight frontier reasoning models

DeepSeek released two new reasoning‑first LLMs: DeepSeek‑V3.2, the successor to V3.2‑Exp, and DeepSeek‑V3.2‑Speciale, a higher‑compute variant positioned as a Gemini‑3‑Pro‑class reasoner while keeping open weights under an MIT license. launch thread V3.2 is already live in the DeepSeek app, web UI, and standard API, while Speciale is served via a temporary API endpoint (no tools) with the same pricing as V3.2 and an announced availability window through 15 Dec 2025. api endpoint update

Pricing screenshots circulating from earlier V3.2‑Exp show input at ~$0.28/M tokens and output at ~$0.42/M tokens, pricing graphic and community commentators say this works out to around 5× cheaper than GPT‑5 High and ~30× cheaper than Gemini 3 Pro on comparable workloads. (price comparison, open source pricing take) Both V3.2 and Speciale model cards plus the full technical report are on Hugging Face, giving builders immediate access to weights, training details, and evaluation setups. open source links (v3 model card, speciale model card )

For AI engineers, the notable part is the combination of: (1) an IMO/IOI‑capable reasoning model that you can download and fine‑tune; (2) an API‑only ultra‑reasoner you can target for the hardest problems without changing pricing tiers; and (3) a clear, public tech report spelling out architecture and post‑training recipes. tech report This means you can start treating DeepSeek V3.2 as both a self‑hosted component in agent stacks and as a swap‑in frontier inference endpoint where you’d normally reach for a GPT‑5‑class model.

DeepSeek V3.2 scores 96% AIME at 5× cheaper tokens – MIT weights target 128K contexts

Executive Summary

Top links today

Feature: DeepSeek V3.2/Speciale — open-weight frontier reasoning

Table of Contents

🐳 Feature: DeepSeek V3.2/Speciale — open-weight frontier reasoning

DeepSeek launches V3.2 and V3.2‑Speciale as open‑weight frontier reasoning models

DeepSeek Sparse Attention warm‑start makes 128K context almost linear‑cost

V3.2‑Speciale posts gold‑medal Olympiad scores and frontier‑class benchmarks

Builders praise DeepSeek V3.2’s price–performance but flag latency and rough edges

V3.2 leans on heavy RL and synthetic agent tasks for tool‑use gains

🎬 Unified video engines and creative stacks

Kling O1 launches as unified omni video generator and editor

Runway Gen-4.5 (Whisper Thunder) takes #1 spot on video leaderboards

Kling O1 rapidly plugs into InVideo, Higgsfield, fal, ElevenLabs and Arena

Vidu Q2 image model debuts with strong arena rankings and free 1K tier

Ovis-Image 7B targets high-fidelity text rendering and lands on HF and fal

🧠 Reasoning methods and orchestration research

Hierarchical Sparse Attention generalizes 8B model from 32K train to 16M context

DeepSeek Sparse Attention warm‑start makes 128K context nearly linear‑cost

Puppeteer learns multi‑agent orchestration with big gains on math and coding

Empowerment‑driven coding assistants stop at uncertainty instead of hallucinating

Focused Chain‑of‑Thought cuts math reasoning tokens 2–3× at same accuracy

RL‑trained stopping policies teach LLMs when to stop “thinking”

ReasonEdit adds plan + reflect loop to make image edits more reliable

🏗️ Compute economics: TPUs vs Nvidia and usage scale

SemiAnalysis pegs Google TPU v7 as far cheaper per PFLOP than Nvidia GB300

Morgan Stanley says biggest near‑term AI risk is not enough Nvidia GPUs

Morgan Stanley starts modeling TPUs as a multibillion‑dollar Google hardware line

US AI data centers shift to 1–2 GW mega‑sites

Jensen Huang downplays TPU threat, says Nvidia GPUs remain "everywhere"

OpenRouter now brokers over 1T LLM tokens per day

🛡️ Safety, policy and misuse watch

Anthropic agents quietly find $4.6M in smart‑contract exploits

OpenAI launches new Alignment Research blog with three technical deep‑dives

Universal jailbreak prompts emerge for DeepSeek‑V3.2 models

Anthropic’s internal “Claude soul” doc leak sparks alignment and misuse debates

Belgium bans DeepSeek app on federal government devices

Researchers question Opus 4.5 safety thresholds for autonomy, cyber and bio

ICLR faces backlash as study finds 21% of reviews fully AI‑written

Graphite finds AI now writes the majority of web articles

⚡️ Serving stacks and omni‑modality runtimes

Transformers v5 RC simplifies model definitions and ecosystem interop

EAGLE‑3 on Vertex AI delivers 2–3× faster decoding without a second model

vLLM-Omni adds unified multimodal serving for Qwen Omni and Qwen-Image

SGLang and Atlas Cloud light up DeepSeek‑V3.2 with full tool calling

👩‍💻 Agent IDEs and coding flows in practice

Cline IDE adds DeepSeek V3.2 and Speciale with “thinking” toggle

OpenAI Codex CLI tops Terminal‑Bench with GPT‑5.1‑Codex‑Max harness

Cursor’s `/deslop` command becomes a go‑to AI code cleanup tool

Kilo Code debuts “Spectre” stealth coding model with 256k context

Raindrop raises $15M as “Sentry for agents” adoption grows

CopilotKit shows Kanban copilot pattern with Microsoft Agent Framework

Kimi K2 CLI proves competitive for full‑stack coding agents

LLM Council now runs on LLM Gateway with free multi‑model routing

“Advent of Claude” explores new UIs on top of Claude Code’s JSONL traces

📊 Leaderboards, openness and model pick lists

Artificial Analysis Openness Index ranks OLMo 3 top, frontier labs near bottom

OpenRouter adds "distillable" model catalog and API flag for synthetic data

DeepSeek V3.2 family enters LMSYS Text Arena

Kimi K2 tops LMSYS November open‑model rankings; DeepSeek V3.2 drops to #4

🔗 Interoperability: MCP & web toolchains

AutoMCP agent auto-generates MCP servers from documentation

LangChain adds Parallel Search, Extract, and Chat web tools

Stagehand and Browserbase ship job-application browser agent template

💼 Enterprise deals and distribution momentum

OpenAI and Accenture team up to deploy ChatGPT Enterprise at massive scale

Arcee’s Trinity Mini MoE hits OpenRouter and Together AI with free access

DeepSeek V3.2 and Speciale spread via OpenRouter and coding agents

Silicon Valley startups quietly standardize on Chinese open models like DeepSeek and Qwen

Groq reports 2.5M devs and new Paytm and Box deployments in November

OpenAI signs Thrive Holdings to embed ChatGPT in accounting and IT services

Perplexity Max now offers Claude Opus 4.5 and more reasoning models

Baseten launches startup program with up to $25K in AI infra credits

NousChat adds anonymous use and USDC on Solana for LLM access

OpenRouter adds "distillable models" catalog and API enforcement flag

On this page