NVIDIA Nemotron 3 Nano opens 30B‑param stack – 1M‑token context rivals GPT‑OSS

Stay in the loop

Free daily newsletter & Telegram daily report

Executive Summary

NVIDIA finally shipped the kind of open model we usually beg for on Twitter: Nemotron 3 Nano, a 30B‑param hybrid Mamba‑Transformer MoE with only 3.6B active parameters and a 1M‑token context window, trained on ~3T tokens and released with weights, data recipe, and RL environments. On Artificial Analysis’s Intelligence Index it scores 52, matching gpt‑oss‑20B while posting 3.3× the tokens/sec/GPU of Qwen3‑30B in 8k/16k tests.

Benchmarks back up the hype curve: Arena‑Hard‑v2 chat comes in at 67.7 vs 57.8 for Qwen3‑30B and 48.5 for GPT‑OSS‑20B, SWE‑Bench hits 38.8% vs 34.0 and 22.0, and on RULER at 1M tokens it lands 86.3 where Qwen3‑30B sits at 77.5 and GPT‑OSS does not even report. Architecturally you get a moderate‑sparsity MoE wired with Mamba‑2 sequence layers, so the million‑token context doesn’t nuke throughput the way dense 30B models tend to.

The ecosystem clearly expected this drop: vLLM, SGLang, Together, Baseten, Replicate, OpenRouter, and Ollama all had Day‑0 support, with Baseten calling out 4× generation speed over Nemotron 2 and LM Studio users reporting ~27 tok/s on a 24GB 3090. With Percy Liang and Artificial Analysis both calling it a new openness high bar, Nemotron 3 Nano looks like the current default if you want GPT‑OSS‑class reasoning without API lock‑in.

Feature: NVIDIA Nemotron 3 Nano goes fully open

NVIDIA’s Nemotron 3 Nano (30B MoE, 1M ctx) ships fully open with data, recipes and NeMo Gym; early benchmarks show top small‑model accuracy and 2.2–3.3× throughput gains—plus Day‑0 support across major runtimes.

Cross‑account, high‑volume story today: NVIDIA’s 30B (3.6B active) hybrid MoE model ships with open weights, data, training recipe and RL envs; broad Day‑0 ecosystem support and strong speed/accuracy charts.

Jump to Feature: NVIDIA Nemotron 3 Nano goes fully open topics

🟩 Feature: NVIDIA Nemotron 3 Nano goes fully open

NVIDIA launches fully open Nemotron 3 Nano hybrid MoE model

NVIDIA has debuted Nemotron 3 Nano, a 30B‑parameter hybrid Mamba‑Transformer Mixture‑of‑Experts model with only ~3.6B active parameters per token, a 1M‑token context window, and a fully open stack: weights, training recipe, redistributable datasets, and RL environments. (release overview, newsroom summary)

The model is trained on roughly 3T tokens and shipped under the NVIDIA Open Model License, which allows commercial use and training of derivatives while keeping the stack transparent (open weights, data curation, and methodology). NVIDIA is also releasing NeMo Gym, a suite of multi‑environment reinforcement learning setups plus NeMo RL tooling so teams can continue post‑training and skill acquisition for agentic workflows on top of Nemotron 3 Nano. newsroom article tech blog

Architecturally, Nemotron 3 Nano mixes Mamba‑2 sequence layers, Transformer attention, and a moderate‑sparsity MoE (31.6B total params, 3.6B active) tuned for long‑context reasoning at reasonable inference cost, which is why it can offer a 1M context window without the throughput collapse you see in many dense 30B models. architecture chart For engineers and researchers, the big change is that NVIDIA isn’t just dropping a checkpoint: you get the pretraining corpus description, RL environments, and recipes for NVFP4 low‑precision training and latent‑MoE routing that you can actually replicate or adapt, making Nemotron 3 Nano feel more like a reference platform than a one‑off model.

NVIDIA Nemotron 3 Nano opens 30B‑param stack – 1M‑token context rivals GPT‑OSS

Executive Summary

Top links today

Feature: NVIDIA Nemotron 3 Nano goes fully open

Table of Contents

🟩 Feature: NVIDIA Nemotron 3 Nano goes fully open

NVIDIA launches fully open Nemotron 3 Nano hybrid MoE model

Nemotron 3 Nano matches GPT‑OSS 20B on IQ index and beats Qwen3‑30B

vLLM, SGLang, Together, Baseten, Replicate, Ollama and more ship Day‑0 Nemotron 3 support

Researchers hail Nemotron 3 Nano as a new high bar for open models

🗣️ Realtime speech stack steps up

OpenAI refreshes realtime STT, TTS and realtime-mini with big quality gains

Chatterbox Turbo open-source TTS spreads across fal, Replicate and Modal

Gemini 2.5 Flash Native Audio gets stronger tools and live translation surfaces

MiniMax Speech lands on Retell AI with sub‑250 ms latency

🧰 Agent stacks and coding workflows

Mino offers a production web automation API that learns once, then runs deterministic flows

Anthropic publishes “Effective harnesses” blueprint for long-running Claude agents

HyperBookLM open-sources a NotebookLM-style agent for web and PDF research

Qwen Code v0.5.0 tightens dev loop with VSCode bundle and TS SDK

Warp details cloud sandboxes for ambient coding agents powered by Namespace

Claude Code users share harness tricks and pain points around context and subagents

CopilotKit’s A2UI Widget Builder helps ship agent UIs that follow Google’s new spec

Kilo Cloud links PR reviews to one-click agent sessions that auto-fix code

LangSmith walkthrough shows how to observe, validate and debug deep agents

Manus 1.6 upgrades its agent architecture, with Max variant scoring 19% higher

📊 Evals: agents hit desktops, pros certs, and horizon forecasts

OSWorld desktop benchmark hits ~human-level with Opus 4.5 + GPT‑5 agent

Frontier reasoning models now pass all three CFA levels on fresh mocks

Zoom’s federated AI system tops Humanity’s Last Exam benchmark with 48.1%

Epoch uses ECI scores to forecast METR time horizons for frontier models

GPT‑5.2 Thinking shows strong multi‑step fact‑check and rewrite behavior

Sansa censorship leaderboard ranks GPT‑5.2 as most heavily guarded frontier model

🧩 Interoperability: MCP and A2UI in practice

Hugging Face ships a full MCP server for models, datasets and Spaces

CopilotKit launches A2UI Widget Builder for AG-UI and Gemini agents

MCP dataset_search makes in‑chat dataset discovery actually usable

Google publishes MCP server repo, hinting at a first‑party connector stack

🚦 Serving and runtime engineering

vLLM splits encoders into a separate service to cut P99 audio/vision latency

SGLang Cookbook ships 40+ copy‑paste recipes for high‑performance LLM serving

Warp details cloud runners: secure sandboxes for ambient coding agents

🧪 New findings: physics of agents, kernels, and bytes

CUDA-L2 uses LLM + RL to auto-generate HGEMM kernels that beat cuBLAS

LLM-driven agents appear to obey a physics-style detailed balance law

AI2’s Bolmo “byteifies” Olmo 3 into strong byte-level LMs at 1B and 7B

LabelFusion fuses transformer features with LLM scores for robust text classification

α-coefficient paper draws a hard line between true AI autonomy and hidden human labor

🕸️ Retrieval and context engineering

Apple’s CLaRa turns RAG into a compressed continuous-memory system

LlamaIndex pushes “RAG 2.0” with virtual filesystems and agentic OCR

🏗️ Compute, schedulers and capex signals

China prepares up to $70B in chip subsidies amid AI export curbs

Q3 2025 servers hit $112.4B as AI GPUs pass half of revenue

NVIDIA buys Slurm maker SchedMD, vows to keep it open

SK hynix warns of tight DRAM as AI soaks up HBM capacity

Google Colab surfaces H100 and 80GB A100 GPUs to notebook users

🎬 Creative video/vision pipelines

Seedream 4.5 jumps to #2 image‑editing model behind Nano Banana Pro

Freepik “Santa Simulator” shows Nano Banana + Kling + Veo production workflow

Veo 3.1 Extend adds +7s seamless continuation for existing videos on fal

EgoX generates egocentric video from a single third‑person clip

Kling O1 “Standard” on fal targets cheaper 3–10s 720p edits

SuperDesign’s “AI designer” chains image, video, and code gen in one tool

SVG‑T2I scales text‑to‑image in latent VFM space without a VAE

ComfyUI gets standalone SCAIL pose nodes for video‑to‑video work

MetaCanvas explores information transfer between MLLMs and diffusion models

Particulate shows feed‑forward 3D “object articulation” from point clouds

🤖 Embodied: gentle grasps and video‑sim evaluation

DeepMind uses Veo video model as a simulator for Gemini robot policy eval

MIT loop-closure gripper lifts heavy but fragile objects with soft contact

🛡️ Risk, safety partnerships and legal stress

Wrongful death suit alleges GPT‑4o exacerbated delusions leading to murder

Google DeepMind expands AI safety partnership with AI Security Institute

Yann LeCun warns of AI assistant monopolies controlling information flow

💼 Enterprise adoption and monetization notes

ElevenLabs’ OSS fund formalizes recurring support for 25 open projects

OpenAI expands ChatGPT Go in Latin America with Rappi free trials

OpenAI’s ChatGPT Go expansion in LatAm leans on Rappi distribution

Anthropic adds Claude gift cards for Pro and Max plans