NVIDIA Nemotron 3 Nano opens 30B‑param stack – 1M‑token context rivals GPT‑OSS feature image for Mon, Dec 15, 2025

NVIDIA Nemotron 3 Nano opens 30B‑param stack – 1M‑token context rivals GPT‑OSS

Stay in the loop

Free daily newsletter & Telegram daily report

Join Telegram Channel

Executive Summary

NVIDIA finally shipped the kind of open model we usually beg for on Twitter: Nemotron 3 Nano, a 30B‑param hybrid Mamba‑Transformer MoE with only 3.6B active parameters and a 1M‑token context window, trained on ~3T tokens and released with weights, data recipe, and RL environments. On Artificial Analysis’s Intelligence Index it scores 52, matching gpt‑oss‑20B while posting 3.3× the tokens/sec/GPU of Qwen3‑30B in 8k/16k tests.

Benchmarks back up the hype curve: Arena‑Hard‑v2 chat comes in at 67.7 vs 57.8 for Qwen3‑30B and 48.5 for GPT‑OSS‑20B, SWE‑Bench hits 38.8% vs 34.0 and 22.0, and on RULER at 1M tokens it lands 86.3 where Qwen3‑30B sits at 77.5 and GPT‑OSS does not even report. Architecturally you get a moderate‑sparsity MoE wired with Mamba‑2 sequence layers, so the million‑token context doesn’t nuke throughput the way dense 30B models tend to.

The ecosystem clearly expected this drop: vLLM, SGLang, Together, Baseten, Replicate, OpenRouter, and Ollama all had Day‑0 support, with Baseten calling out 4× generation speed over Nemotron 2 and LM Studio users reporting ~27 tok/s on a 24GB 3090. With Percy Liang and Artificial Analysis both calling it a new openness high bar, Nemotron 3 Nano looks like the current default if you want GPT‑OSS‑class reasoning without API lock‑in.

Top links today

Feature Spotlight

Feature: NVIDIA Nemotron 3 Nano goes fully open

NVIDIA’s Nemotron 3 Nano (30B MoE, 1M ctx) ships fully open with data, recipes and NeMo Gym; early benchmarks show top small‑model accuracy and 2.2–3.3× throughput gains—plus Day‑0 support across major runtimes.

Cross‑account, high‑volume story today: NVIDIA’s 30B (3.6B active) hybrid MoE model ships with open weights, data, training recipe and RL envs; broad Day‑0 ecosystem support and strong speed/accuracy charts.

Jump to Feature: NVIDIA Nemotron 3 Nano goes fully open topics

Table of Contents

🟩 Feature: NVIDIA Nemotron 3 Nano goes fully open

NVIDIA launches fully open Nemotron 3 Nano hybrid MoE model

Nemotron 3 Nano matches GPT‑OSS 20B on IQ index and beats Qwen3‑30B

vLLM, SGLang, Together, Baseten, Replicate, Ollama and more ship Day‑0 Nemotron 3 support

Researchers hail Nemotron 3 Nano as a new high bar for open models


🗣️ Realtime speech stack steps up

OpenAI refreshes realtime STT, TTS and realtime-mini with big quality gains

Chatterbox Turbo open-source TTS spreads across fal, Replicate and Modal

Gemini 2.5 Flash Native Audio gets stronger tools and live translation surfaces

MiniMax Speech lands on Retell AI with sub‑250 ms latency


🧰 Agent stacks and coding workflows

Mino offers a production web automation API that learns once, then runs deterministic flows

Anthropic publishes “Effective harnesses” blueprint for long-running Claude agents

HyperBookLM open-sources a NotebookLM-style agent for web and PDF research

Qwen Code v0.5.0 tightens dev loop with VSCode bundle and TS SDK

Warp details cloud sandboxes for ambient coding agents powered by Namespace

Claude Code users share harness tricks and pain points around context and subagents

CopilotKit’s A2UI Widget Builder helps ship agent UIs that follow Google’s new spec

LangSmith walkthrough shows how to observe, validate and debug deep agents

Manus 1.6 upgrades its agent architecture, with Max variant scoring 19% higher


📊 Evals: agents hit desktops, pros certs, and horizon forecasts

OSWorld desktop benchmark hits ~human-level with Opus 4.5 + GPT‑5 agent

Frontier reasoning models now pass all three CFA levels on fresh mocks

Zoom’s federated AI system tops Humanity’s Last Exam benchmark with 48.1%

Epoch uses ECI scores to forecast METR time horizons for frontier models

GPT‑5.2 Thinking shows strong multi‑step fact‑check and rewrite behavior

Sansa censorship leaderboard ranks GPT‑5.2 as most heavily guarded frontier model


🧩 Interoperability: MCP and A2UI in practice

Hugging Face ships a full MCP server for models, datasets and Spaces

CopilotKit launches A2UI Widget Builder for AG-UI and Gemini agents

MCP dataset_search makes in‑chat dataset discovery actually usable

Google publishes MCP server repo, hinting at a first‑party connector stack


🚦 Serving and runtime engineering

vLLM splits encoders into a separate service to cut P99 audio/vision latency

SGLang Cookbook ships 40+ copy‑paste recipes for high‑performance LLM serving

Warp details cloud runners: secure sandboxes for ambient coding agents


🧪 New findings: physics of agents, kernels, and bytes

CUDA-L2 uses LLM + RL to auto-generate HGEMM kernels that beat cuBLAS

LLM-driven agents appear to obey a physics-style detailed balance law

AI2’s Bolmo “byteifies” Olmo 3 into strong byte-level LMs at 1B and 7B

LabelFusion fuses transformer features with LLM scores for robust text classification

α-coefficient paper draws a hard line between true AI autonomy and hidden human labor


🕸️ Retrieval and context engineering

Apple’s CLaRa turns RAG into a compressed continuous-memory system

LlamaIndex pushes “RAG 2.0” with virtual filesystems and agentic OCR


🏗️ Compute, schedulers and capex signals

China prepares up to $70B in chip subsidies amid AI export curbs

Q3 2025 servers hit $112.4B as AI GPUs pass half of revenue

NVIDIA buys Slurm maker SchedMD, vows to keep it open

SK hynix warns of tight DRAM as AI soaks up HBM capacity

Google Colab surfaces H100 and 80GB A100 GPUs to notebook users


🎬 Creative video/vision pipelines

Seedream 4.5 jumps to #2 image‑editing model behind Nano Banana Pro

Freepik “Santa Simulator” shows Nano Banana + Kling + Veo production workflow

Veo 3.1 Extend adds +7s seamless continuation for existing videos on fal

EgoX generates egocentric video from a single third‑person clip

Kling O1 “Standard” on fal targets cheaper 3–10s 720p edits

SuperDesign’s “AI designer” chains image, video, and code gen in one tool

SVG‑T2I scales text‑to‑image in latent VFM space without a VAE

ComfyUI gets standalone SCAIL pose nodes for video‑to‑video work

MetaCanvas explores information transfer between MLLMs and diffusion models

Particulate shows feed‑forward 3D “object articulation” from point clouds


🤖 Embodied: gentle grasps and video‑sim evaluation

DeepMind uses Veo video model as a simulator for Gemini robot policy eval

MIT loop-closure gripper lifts heavy but fragile objects with soft contact


Wrongful death suit alleges GPT‑4o exacerbated delusions leading to murder

Google DeepMind expands AI safety partnership with AI Security Institute

Yann LeCun warns of AI assistant monopolies controlling information flow


💼 Enterprise adoption and monetization notes

ElevenLabs’ OSS fund formalizes recurring support for 25 open projects

OpenAI expands ChatGPT Go in Latin America with Rappi free trials

OpenAI’s ChatGPT Go expansion in LatAm leans on Rappi distribution

Anthropic adds Claude gift cards for Pro and Max plans

Copilot Flight Log becomes Microsoft’s answer to AI year‑in‑review

OpenAI drops 6‑month vesting cliff to stay competitive in AI talent war

Zoom’s federated AI stack tops Humanity’s Last Exam with 48.1%

Lovable introduces gift cards so users can “gift a builder subscription”

Ramp hears real AI ROI in customer service from a public tech CFO

Firecrawl adds SSO for Enterprise as it courts larger AI scraping customers

On this page

Executive Summary
Feature Spotlight: Feature: NVIDIA Nemotron 3 Nano goes fully open
🟩 Feature: NVIDIA Nemotron 3 Nano goes fully open
NVIDIA launches fully open Nemotron 3 Nano hybrid MoE model
Nemotron 3 Nano matches GPT‑OSS 20B on IQ index and beats Qwen3‑30B
vLLM, SGLang, Together, Baseten, Replicate, Ollama and more ship Day‑0 Nemotron 3 support
Researchers hail Nemotron 3 Nano as a new high bar for open models
🗣️ Realtime speech stack steps up
OpenAI refreshes realtime STT, TTS and realtime-mini with big quality gains
Chatterbox Turbo open-source TTS spreads across fal, Replicate and Modal
Gemini 2.5 Flash Native Audio gets stronger tools and live translation surfaces
MiniMax Speech lands on Retell AI with sub‑250 ms latency
🧰 Agent stacks and coding workflows
Mino offers a production web automation API that learns once, then runs deterministic flows
Anthropic publishes “Effective harnesses” blueprint for long-running Claude agents
HyperBookLM open-sources a NotebookLM-style agent for web and PDF research
Qwen Code v0.5.0 tightens dev loop with VSCode bundle and TS SDK
Warp details cloud sandboxes for ambient coding agents powered by Namespace
Claude Code users share harness tricks and pain points around context and subagents
CopilotKit’s A2UI Widget Builder helps ship agent UIs that follow Google’s new spec
Kilo Cloud links PR reviews to one-click agent sessions that auto-fix code
LangSmith walkthrough shows how to observe, validate and debug deep agents
Manus 1.6 upgrades its agent architecture, with Max variant scoring 19% higher
📊 Evals: agents hit desktops, pros certs, and horizon forecasts
OSWorld desktop benchmark hits ~human-level with Opus 4.5 + GPT‑5 agent
Frontier reasoning models now pass all three CFA levels on fresh mocks
Zoom’s federated AI system tops Humanity’s Last Exam benchmark with 48.1%
Epoch uses ECI scores to forecast METR time horizons for frontier models
GPT‑5.2 Thinking shows strong multi‑step fact‑check and rewrite behavior
Sansa censorship leaderboard ranks GPT‑5.2 as most heavily guarded frontier model
🧩 Interoperability: MCP and A2UI in practice
Hugging Face ships a full MCP server for models, datasets and Spaces
CopilotKit launches A2UI Widget Builder for AG-UI and Gemini agents
MCP dataset_search makes in‑chat dataset discovery actually usable
Google publishes MCP server repo, hinting at a first‑party connector stack
🚦 Serving and runtime engineering
vLLM splits encoders into a separate service to cut P99 audio/vision latency
SGLang Cookbook ships 40+ copy‑paste recipes for high‑performance LLM serving
Warp details cloud runners: secure sandboxes for ambient coding agents
🧪 New findings: physics of agents, kernels, and bytes
CUDA-L2 uses LLM + RL to auto-generate HGEMM kernels that beat cuBLAS
LLM-driven agents appear to obey a physics-style detailed balance law
AI2’s Bolmo “byteifies” Olmo 3 into strong byte-level LMs at 1B and 7B
LabelFusion fuses transformer features with LLM scores for robust text classification
α-coefficient paper draws a hard line between true AI autonomy and hidden human labor
🕸️ Retrieval and context engineering
Apple’s CLaRa turns RAG into a compressed continuous-memory system
LlamaIndex pushes “RAG 2.0” with virtual filesystems and agentic OCR
🏗️ Compute, schedulers and capex signals
China prepares up to $70B in chip subsidies amid AI export curbs
Q3 2025 servers hit $112.4B as AI GPUs pass half of revenue
NVIDIA buys Slurm maker SchedMD, vows to keep it open
SK hynix warns of tight DRAM as AI soaks up HBM capacity
Google Colab surfaces H100 and 80GB A100 GPUs to notebook users
🎬 Creative video/vision pipelines
Seedream 4.5 jumps to #2 image‑editing model behind Nano Banana Pro
Freepik “Santa Simulator” shows Nano Banana + Kling + Veo production workflow
Veo 3.1 Extend adds +7s seamless continuation for existing videos on fal
EgoX generates egocentric video from a single third‑person clip
Kling O1 “Standard” on fal targets cheaper 3–10s 720p edits
SuperDesign’s “AI designer” chains image, video, and code gen in one tool
SVG‑T2I scales text‑to‑image in latent VFM space without a VAE
ComfyUI gets standalone SCAIL pose nodes for video‑to‑video work
MetaCanvas explores information transfer between MLLMs and diffusion models
Particulate shows feed‑forward 3D “object articulation” from point clouds
🤖 Embodied: gentle grasps and video‑sim evaluation
DeepMind uses Veo video model as a simulator for Gemini robot policy eval
MIT loop-closure gripper lifts heavy but fragile objects with soft contact
🛡️ Risk, safety partnerships and legal stress
Wrongful death suit alleges GPT‑4o exacerbated delusions leading to murder
Google DeepMind expands AI safety partnership with AI Security Institute
Yann LeCun warns of AI assistant monopolies controlling information flow
💼 Enterprise adoption and monetization notes
ElevenLabs’ OSS fund formalizes recurring support for 25 open projects
OpenAI expands ChatGPT Go in Latin America with Rappi free trials
OpenAI’s ChatGPT Go expansion in LatAm leans on Rappi distribution
Anthropic adds Claude gift cards for Pro and Max plans
Copilot Flight Log becomes Microsoft’s answer to AI year‑in‑review
OpenAI drops 6‑month vesting cliff to stay competitive in AI talent war
Zoom’s federated AI stack tops Humanity’s Last Exam with 48.1%
Lovable introduces gift cards so users can “gift a builder subscription”
Ramp hears real AI ROI in customer service from a public tech CFO
Firecrawl adds SSO for Enterprise as it courts larger AI scraping customers