
NVIDIA Nemotron 3 Nano opens 30B‑param stack – 1M‑token context rivals GPT‑OSS
Stay in the loop
Free daily newsletter & Telegram daily report
Executive Summary
NVIDIA finally shipped the kind of open model we usually beg for on Twitter: Nemotron 3 Nano, a 30B‑param hybrid Mamba‑Transformer MoE with only 3.6B active parameters and a 1M‑token context window, trained on ~3T tokens and released with weights, data recipe, and RL environments. On Artificial Analysis’s Intelligence Index it scores 52, matching gpt‑oss‑20B while posting 3.3× the tokens/sec/GPU of Qwen3‑30B in 8k/16k tests.
Benchmarks back up the hype curve: Arena‑Hard‑v2 chat comes in at 67.7 vs 57.8 for Qwen3‑30B and 48.5 for GPT‑OSS‑20B, SWE‑Bench hits 38.8% vs 34.0 and 22.0, and on RULER at 1M tokens it lands 86.3 where Qwen3‑30B sits at 77.5 and GPT‑OSS does not even report. Architecturally you get a moderate‑sparsity MoE wired with Mamba‑2 sequence layers, so the million‑token context doesn’t nuke throughput the way dense 30B models tend to.
The ecosystem clearly expected this drop: vLLM, SGLang, Together, Baseten, Replicate, OpenRouter, and Ollama all had Day‑0 support, with Baseten calling out 4× generation speed over Nemotron 2 and LM Studio users reporting ~27 tok/s on a 24GB 3090. With Percy Liang and Artificial Analysis both calling it a new openness high bar, Nemotron 3 Nano looks like the current default if you want GPT‑OSS‑class reasoning without API lock‑in.
Top links today
- RLHF book for RL from human feedback
- LabelFusion robust text classification with LLMs
- Weird generalization and inductive backdoors in LLMs
- Detailed balance in LLM-driven agents
- Impact of hallucination reduction on LLM creativity
- Alpha coefficient framework for AI autonomy
- AI benchmark democratization and carpentry overview
- Evaluating Gemini robotics policies in Veo
- Artemis automated optimization of LLM agents
- Bolmo byte-level language model distillation paper
- IDC report on AI-driven server revenue
- Analysis of DRAM shortage and 8GB laptops
- Samsung–AMD talks on 2nm CPU manufacturing
- CNBC analysis of AI infrastructure capex selloff
- Statista chart of leading global AI hubs
Feature Spotlight
Feature: NVIDIA Nemotron 3 Nano goes fully open
NVIDIA’s Nemotron 3 Nano (30B MoE, 1M ctx) ships fully open with data, recipes and NeMo Gym; early benchmarks show top small‑model accuracy and 2.2–3.3× throughput gains—plus Day‑0 support across major runtimes.
Cross‑account, high‑volume story today: NVIDIA’s 30B (3.6B active) hybrid MoE model ships with open weights, data, training recipe and RL envs; broad Day‑0 ecosystem support and strong speed/accuracy charts.
Jump to Feature: NVIDIA Nemotron 3 Nano goes fully open topics