releaseMarch 13, 2026

NVIDIA releases Nemotron 3 Super: 120B open model targets 1M-token agent workloads

NVIDIA released Nemotron 3 Super, a 120B open model with 1M-token context and a hybrid architecture tuned for agent workloads, then landed it in Perplexity and Baseten. Try it if you need an open-weight long-context option that is already available in hosted stacks.

Agent Readiness Benchmarks

3 min read

NVIDIA releases Nemotron 3 Super: 120B open model targets 1M-token agent workloads

TL;DR

NVIDIA's launch thread introduces Nemotron 3 Super as a 120B open model for agent workloads, pairing a 1M-token context window with a design aimed at the "context explosion" that shows up when multi-agent systems keep replaying prior state.
According to NVIDIA's linked launch post, the model uses 12B active parameters at inference, mixes Mamba and Transformer layers, and adds multi-token prediction; NVIDIA says that yields up to 5x higher throughput and 2x better accuracy than earlier agent-focused models.
Availability moved quickly: Perplexity's product update says Nemotron 3 Super is already live in Perplexity, Agent API, and Computer, while Baseten's launch-partner post says it was a day-0 hosting partner.
Early benchmark chatter is splitting into two buckets: NVIDIA and ecosystem partners are emphasizing agent throughput and long-context efficiency launch post, while a separate

What did NVIDIA ship?

Rohan Paul

@rohanpaul_ai

·Follow

NVIDIA open-sourced Nemotron 3 Super, a 120B param open model built to fix the massive bottlenecks slowing down autonomous AI agents. When collaborating AI agents talk to each other, they suffer from context explosion by constantly repeating their entire conversation history. Show more

7:04 AM · Mar 13, 2026

Read 9 replies

Nemotron 3 Super is an open-weight 120B model built for long-running agent systems rather than single-turn chat. The core problem, as the launch thread frames it, is that collaborating agents can generate "15x more text than normal" by repeatedly restating shared context, which turns reasoning-heavy workflows into a latency and cost problem.

NVIDIA's launch post says the model attacks that in three ways. First, it stretches context to 1 million tokens so agents can retain full workflow state. Second, it uses a sparse setup with 12B active parameters out of 120B total, reducing the compute used per task. Third, it combines Mamba layers for memory efficiency with Transformer layers for reasoning, then adds multi-token prediction to generate multiple future tokens per step. The same NVIDIA blog post claims up to 5x higher throughput and 2x better accuracy than prior models in agentic settings, and names AI-Q research agents and multistep enterprise workflows as target use cases.

Where can you run it now?

Perplexity

@perplexity_ai

·Follow

NVIDIA’s Nemotron 3 Super is now available in Perplexity, Agent API, and Computer.

6:15 PM · Mar 13, 2026

1.8K

Read 107 replies

Distribution was part of the launch, not a later follow-up. Perplexity's product update says Nemotron 3 Super is already selectable in its consumer chat product and available through Agent API and Computer, which matters for teams that want to test the model inside an existing hosted agent stack instead of standing up open weights from scratch. The attached [img:5|Perplexity model picker] shows it shipping alongside other frontier model options with a direct selector in the UI.

Baseten

@baseten

·Follow

In this piece, @kirkby_max and @oneill_c use constitutional alignment as a testbed to evaluate the importance of on- versus off-policy and dense versus sparse feedback during post-training.