DeepSeekMath‑V2 releases 689GB IMO‑gold weights – 99% basic proofs

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

DeepSeek dropped DeepSeekMath‑V2, a 689GB math specialist built on DeepSeek‑V3.2‑Exp‑Base and licensed Apache‑2.0, and it might be the first time you can literally download an IMO‑gold brain. The model hits IMO 2025 and CMO 2024 gold levels plus 118/120 on the 2024 Putnam when you scale test‑time compute, putting an open stack in the same league as Google and OpenAI’s previously API‑only Olympiad systems.

The twist is how it reasons. DeepSeek trains a verifier LLM first, then uses it as a reward model to RL‑train the generator, scoring step‑by‑step proof quality instead of only final answers. On ProofBench it clocks 99.0% human‑checked success on Basic problems and 61.9% on Advanced, beating Gemini Deep Think (IMO Gold) on the easy split and staying close on the hard one. That makes it a defensible theorem‑proving backend or math agent brain rather than yet another chatty generalist.

In the wider tooling race we’ve been covering, Claude Opus 4.5 keeps winning human‑judged coding arenas while Amp’s “off‑the‑rails cost” metric shows Gemini burning 7× more wasted spend than Opus. Put together, the center of gravity is shifting: closed labs still dominate broad UX, but the sharpest domain brains—and the eval harnesses to use them well—are rapidly moving into the open.

Feature: DeepSeekMath‑V2 opens IMO‑gold math reasoning

DeepSeekMath‑V2 releases open weights (Apache‑2.0) with a verifier→generator RL recipe, hitting IMO 2025 gold, CMO gold, and 118/120 Putnam—first public, reproducible Olympiad‑level math reasoner.

Cross‑account focus today is DeepSeek’s open‑weights self‑verifying math model. Tweets show Apache‑2.0 weights on HF, verifier→generator RL, and Olympiad‑level results; strong interest from researchers and builders.

Jump to Feature: DeepSeekMath‑V2 opens IMO‑gold math reasoning topics

🐳 Feature: DeepSeekMath‑V2 opens IMO‑gold math reasoning

DeepSeek open‑sources IMO‑gold math model under Apache 2.0

DeepSeek has released DeepSeekMath‑V2 — a 689 GB math‑specialist model built on DeepSeek‑V3.2‑Exp‑Base — as open weights on Hugging Face under an Apache‑2.0 license, giving engineers a downloadable IMO‑gold reasoning model for the first time release recap simonw note. Community voices are framing it as “owning the brain of one of the best mathematicians in the world for free,” emphasizing that you can inspect, fine‑tune, and self‑host it without policy nerfs or recall risk hf ceo comment. For infra and research teams, the size (689 GB) and license mean serious but tractable on‑prem deployments, full reproducibility for math‑reasoning work, and a clear baseline for future open RL‑reasoning stacks.

Compared to previous Olympiad‑level models from OpenAI and Google, which were only accessible via APIs, this drop shifts the frontier of math reasoning firmly into the open‑source world and sets a new reference point for academic benchmarks, agent tooling, and verifier research benchmarks thread GitHub repo.

DeepSeekMath‑V2 releases 689GB IMO‑gold weights – 99% basic proofs

Executive Summary

Top links today

Feature: DeepSeekMath‑V2 opens IMO‑gold math reasoning

Table of Contents

🐳 Feature: DeepSeekMath‑V2 opens IMO‑gold math reasoning

DeepSeek open‑sources IMO‑gold math model under Apache 2.0

Verifier‑driven RL makes DeepSeekMath‑V2 a near‑perfect proof engine

Open DeepSeekMath‑V2 matches proprietary IMO and Putnam gold scores

📊 Evals race: ARC‑AGI claims, physics checks, and production metrics

Poetiq claims superhuman ARC‑AGI‑2 at ~$50/task; ARC Prize reviewing

CritPt physics benchmark shows Gemini 3 Pro at 9.1% and everyone else worse

Amp introduces “Off‑the‑Rails Cost” to quantify wasted model spend

Claude Opus 4.5 extends its lead across LMArena WebDev and expert tasks

🧑‍💻 Coding agents in practice: planning, PRs, UX loops

Data backs what seniors do intuitively: plan-first messages make coding agents better

Google’s Jules SWE Agent readies scheduled jobs and Proactive Mode

AmpCode adds visibility into free-mode allowance and wasted agent runs

Opus 4.5 starts to reliably close the front-end design loop

Developers are wiring Claude-like agents into CLIs to control their machines

RepoPrompt shows a practical pattern for LLM second opinions on code

Data Analysis Arena prototypes multi-model agents for real-world analytics tasks

🧩 Interoperable agents: MCP surfaces and evented UIs

CopilotKit’s AG‑UI spec pushes agents beyond the single chat box

LangChain Deep Agents add remote sandboxes for safer long‑running code

n8n’s MCP connectors move from launch to real agent workflows

🗂️ RAG efficiency: compress, align, and reuse experience

Meta’s REFRAG compresses RAG context for up to 30× faster decoding

Apple’s CLaRa jointly trains retriever and generator in latent space

BREW turns agent histories into reusable ‘concept docs’ for faster runs

FreshStack benchmarks RAG with retrieval, nugget, and support metrics

Local‑Web RAG agent beats commercial deep research on materials problems

🧠 Reasoning post‑training and new optimizers

INTELLECT‑3 open-sources a 106B MoE trained with large-scale async RL

EGGROLL scales low-rank evolution strategies to 1B‑parameter models without backprop

VISTA-Gym and VISTA-R1 scale RL for tool-using vision-language models

GEPA argues reflective prompt evolution can rival RL-style post-training

LatentMAS boosts multi-agent performance with latent-space collaboration instead of text

GigaEvo open-sources an AlphaEvolve-style LLM + evolution framework

🎬 Creative stacks: video ELOs, precise retakes, fast image gens

Indie model ‘Whisper Thunder’ tops Artificial Analysis text-to-video ELO

InVideo’s ‘Agents & Models’ weekend gives free access to 70+ media models

Z‑Image Turbo proves fast, cheap, and stack‑friendly across fal, HF, and ComfyUI

ElevenLabs’ LTX-2 Retake brings timecoded in-shot edits to its video suite

Nano Banana Pro + Gemini 3 fuel time‑travel UIs, typography, and 360 scenes

Tencent’s Hunyuan 3D-PolyGen 1.5 generates art-grade quad meshes for game pipelines

💼 Platform distribution and enterprise signals

OpenAI readies in‑ChatGPT Apps and Workflows publishing surface

Bezos’ Project Prometheus quietly acquires General Agents

OpenAI expands data residency controls for ChatGPT Enterprise, Edu and API

Perplexity adds language-learning flashcards on web and iOS

Virgin Australia becomes first local airline to collaborate with OpenAI

Genspark pushes 40% Black Friday discount on "finished deliverables" AI

⚡ Compute geopolitics and power for AI

China’s domestic AI chips close on Nvidia as regulators squeeze H100/H20

Google’s ‘Project Suncatcher’ revives plan for TPU data centers in orbit by 2027

xAI plans 30 MW onsite solar farm for its Memphis ‘Colossus’ AI data center

Emad Mostaque calls energy the real AI bottleneck as DOE’s Genesis ramps

🛡️ Oversight, liability, and surveillance debates

US House summons Anthropic CEO over alleged AI-orchestrated China cyberattack

OpenAI denies liability in teen suicide case, cites safeguards and age limits

Telegram’s Durov says EU child-safety plans risk mass surveillance and censorship

🚀 Serving and throughput: practical signals

OpenRouter publishes throughput‑sorted leaderboard for tool‑calling models

vLLM confirms INTELLECT‑3 serving support for RL‑trained MoE model

🦾 Embodied AI: from demos to real tasks

MobileVLA‑R1 uses CoT+GRPO to steady quadruped navigation and control

Humanoid running demos vs toilet‑cleaning deployments spark ROI debate

Indie builder shows basement sim2real loop with custom debug UI

On this page