New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Published: 1 month ago (March 11, 2026 at 12:00 PM EDT)

5 min read

Source: NVIDIA AI Blog

NVIDIA Nemotron 3 Super

Launched today, NVIDIA Nemotron 3 Super is a 120‑billion‑parameter open model with 12 billion active parameters. It is built to run complex, agentic AI systems at scale, delivering advanced reasoning and high‑accuracy task completion for autonomous agents.

AI‑Native Companies

Perplexity – offers Nemotron 3 Super for search and as one of 20 orchestrated models in Computer.
Software‑development agents –
- CodeRabbit
- Factory
- Greptile
  These platforms integrate Nemotron 3 Super (alongside proprietary models) to achieve higher accuracy at lower cost.
Life‑sciences & frontier‑AI – Edison Scientific and Lila Sciences will power their agents for deep literature search, data‑science workflows, and molecular understanding.

Enterprise Software Platforms

Industry leaders deploying and customizing Nemotron 3 Super include:

Amdocs – telecom workflow automation
Palantir – data‑centric AI solutions
Cadence – semiconductor design assistance
Dassault Systèmes – product‑development pipelines
Siemens – Fuse EDA AI System for manufacturing and cybersecurity

These platforms use the model to automate complex workflows across telecom, cybersecurity, semiconductor design, and manufacturing.

Challenges in Multi‑Agent Applications

Context explosion
- Multi‑agent workflows can generate up to 15× more tokens than standard chat (each interaction resends full histories, tool outputs, and intermediate reasoning).
- The growing context inflates costs and can cause goal drift, where agents lose alignment with the original objective.
Thinking tax
- Complex agents must reason at every step.
- Using large models for every sub‑task makes multi‑agent systems expensive and sluggish.

How Nemotron 3 Super Addresses These Issues

1‑million‑token context window – agents can retain the entire workflow state in memory, dramatically reducing context‑related costs and preventing goal drift.
Efficiency & openness – ranked #1 on Artificial Analysis for efficiency and openness, with leading accuracy among models of comparable size.

Benchmark Performance

Powers the NVIDIA AI‑Q research agent, which holds the No. 1 spot on:
- DeepResearch Bench
- DeepResearch Bench II

These benchmarks evaluate an AI system’s ability to conduct thorough, multi‑step research across large document sets while maintaining coherent reasoning.

Nemotron 3 Super sets a new standard for scalable, high‑performance multi‑agent AI.

Hybrid Architecture

Nemotron 3 Super employs a hybrid mixture‑of‑experts (MoE) design that blends three key innovations, delivering up to 5× higher throughput and 2× higher accuracy compared with the previous Nemotron Super model.

Innovation	What it does	Benefit
Hybrid Architecture	Combines Mamba layers (for memory‑ and compute‑efficiency) with Transformer layers (for advanced reasoning)	4× better memory & compute efficiency
MoE	Activates only 12 B of the total 120 B parameters during inference	Reduces compute cost while keeping model capacity
Latent MoE	Triggers four expert specialists for the cost of a single expert when generating the next token	Improves accuracy without extra latency
Multi‑Token Prediction	Predicts several future tokens in parallel	3× faster inference

NVIDIA Blackwell Platform

Runs in NVFP4 precision – this reduces memory usage and makes inference up to 4× faster than FP8 on NVIDIA Hopper, without any loss in accuracy.

Open Weights, Data, and Recipes

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in data centers, or in the cloud.

Training Data and Methodology

Trained on synthetic data generated using frontier‑reasoning models.
NVIDIA publishes the complete methodology, including:
- 10 + trillion tokens of pre‑ and post‑training datasets.
- 15 training environments for reinforcement learning.
- Evaluation recipes.

Getting Started

Researchers can use the NVIDIA NeMo platform to:

Fine‑tune the model.
Build their own models and pipelines.

Use in Agentic Systems

Nemotron 3 Super is built to handle complex subtasks within multi‑agent systems.

Software development – An agent can load an entire codebase into context at once, enabling end‑to‑end code generation and debugging without having to segment documents.
Financial analysis – The model can ingest thousands of pages of reports in a single context, eliminating the need to re‑reason across long conversations and dramatically improving efficiency.
High‑stakes tool calling – Nemotron 3 Super’s high‑accuracy tool‑calling lets autonomous agents reliably navigate massive function libraries, preventing execution errors in critical environments such as autonomous security orchestration for cybersecurity.

Availability

NVIDIA Nemotron 3 Super, part of the Nemotron 3 family, can be accessed through:

NVIDIA platforms – build.nvidia.com, Perplexity, OpenRouter and Hugging Face.
Enterprise integrations – Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face (optimized for on‑premise deployment on the Dell AI Factory). HPE is also adding NVIDIA Nemotron to its agents hub to support scalable enterprise adoption of agentic AI.

Cloud Service Providers

Google Cloud Vertex AI
Oracle Cloud Infrastructure
Amazon Web Services (coming soon via Amazon Bedrock)
Microsoft Azure

NVIDIA Cloud Partners

Inference Service Providers

Data Platforms & Services

Distyl
Dataiku
DataRobot
Deloitte
EY
Tata Consultancy Services

The model is packaged as an NVIDIA NIM microservice, enabling deployment from on‑premises systems to the cloud.

Stay Up to Date

Subscribe to the NVIDIA AI news newsletter
Join the NVIDIA developer community
Follow NVIDIA AI on:

Explore self‑paced video tutorials and livestreams on the NVIDIA AI YouTube playlist.

New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

NVIDIA Nemotron 3 Super

AI‑Native Companies

Enterprise Software Platforms

Challenges in Multi‑Agent Applications

How Nemotron 3 Super Addresses These Issues

Benchmark Performance

Hybrid Architecture

NVIDIA Blackwell Platform

Open Weights, Data, and Recipes

Training Data and Methodology

Getting Started

Use in Agentic Systems

Availability

Cloud Service Providers

NVIDIA Cloud Partners

Inference Service Providers

Data Platforms & Services

Stay Up to Date

Related posts

How to watch Jensen Huang’s Nvidia GTC 2026 keynote — and what to expect

Chatbots, AI Agents, and Agentic AI: Understanding the Evolution of Intelligent Systems

What Is Agentic AI?

How to Build Your First AI Agent in 2026: A Practical Guide

NVIDIA Nemotron 3 Super

AI‑Native Companies

Enterprise Software Platforms

Challenges in Multi‑Agent Applications

How Nemotron 3 Super Addresses These Issues

Benchmark Performance

Hybrid Architecture

NVIDIA Blackwell Platform

Open Weights, Data, and Recipes

Training Data and Methodology

Getting Started

Use in Agentic Systems

Availability

Cloud Service Providers

NVIDIA Cloud Partners

Inference Service Providers

Data Platforms & Services

Stay Up to Date

Related posts

How to watch Jensen Huang’s Nvidia GTC 2026 keynote — and what to expect

Chatbots, AI Agents, and Agentic AI: Understanding the Evolution of Intelligent Systems

What Is Agentic AI?

How to Build Your First AI Agent in 2026: A Practical Guide

NVIDIA Nemotron 3 Super

How Nemotron 3 Super Addresses These Issues