NVIDIA published Nemotron-Cascade 2, a 30B MoE with 3B active parameters, claiming IMO gold-level math and Kimi K2.5-class code scores, then pushed it to Hugging Face and Ollama. It is worth testing if you want an open agent model with immediate local and hosted paths.

Nemotron-Cascade 2 is a new open model release centered on a 30B MoE architecture with 3B active parameters. The Hugging Face post links both the paper and model collection, while the paper page and model collection make this more than a benchmark teaser: there are public assets engineers can inspect and pull into existing workflows.
The headline claims are aggressive. NVIDIA’s paper card says the model achieves “Gold Medal-level performance” on the 2025 IMO and shows comparisons against DeepSeek-V3.5-35B-A3B and Kimi-K2.5-17-Thinking across LiveCodeBench, SWE Verified OpenHands, Humanity’s Last Exam, and ArenaHard v2. That same card describes the release as “Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation,” which is the main technical framing for how NVIDIA says it got there.
The practical part of this launch is that it already has local runtime paths. Ollama’s announcement says you can run it with ollama run nemotron-cascade-2, and its model page positions the model for “reasoning and agentic capabilities” rather than as a generic chat checkpoint.
Ollama’s follow-up model page thread adds a few deployment details that matter: the page describes thinking and instruct modes, mentions use in tools like OpenClaw, and highlights a 24GB variant with a 256K context window. Separately, the quantization post shows the community is already adapting the model for constrained hardware, with MLX 5-bit and GGUF Q5 variants on Hugging Face via the MLX build and the GGUF build. The GGUF summary says the quantized runtime footprint is about 26 GB, which puts local testing within reach on a single high-memory workstation rather than only server GPUs.
Vals AI switched SWE-Bench Verified from SWE-Agent to the bash-only mini-swe-agent harness, aligning results more closely with the official benchmark setup. Top score dipped slightly to 78.8%, but the change reduces harness-specific confounds when comparing models.
releaseOpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.
releaseCursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.
breakingChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
breakingEpoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
Couldn't find any quants so I made some: MLX 5-bit: huggingface.co/AdrienBrault/N… GGUF Q5_K_M: huggingface.co/AdrienBrault/N…
Nemotron-Cascade-2 is now available to run with Ollama. ollama run nemotron-cascade-2 To run it locally with OpenClaw: ollama launch openclaw --model nemotron-cascade-2 This model from NVIDIA delivers strong reasoning and agentic capabilities on par with models with up to 20x Show more
Nvidia just released Nemotron-Cascade 2 on Hugging Face paper: huggingface.co/papers/2603.19… model: huggingface.co/collections/nv…