New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI
Source: NVIDIA AI Blog
NVIDIA Nemotron 3 Super
Launched today, NVIDIA Nemotron 3 Super is a 120‑billion‑parameter open model with 12 billion active parameters. It is built to run complex, agentic AI systems at scale, delivering advanced reasoning and high‑accuracy task completion for autonomous agents.
AI‑Native Companies
- Perplexity – offers Nemotron 3 Super for search and as one of 20 orchestrated models in Computer.
- Software‑development agents –
- CodeRabbit
- Factory
- Greptile
These platforms integrate Nemotron 3 Super (alongside proprietary models) to achieve higher accuracy at lower cost.
- Life‑sciences & frontier‑AI – Edison Scientific and Lila Sciences will power their agents for deep literature search, data‑science workflows, and molecular understanding.
Enterprise Software Platforms
Industry leaders deploying and customizing Nemotron 3 Super include:
- Amdocs – telecom workflow automation
- Palantir – data‑centric AI solutions
- Cadence – semiconductor design assistance
- Dassault Systèmes – product‑development pipelines
- Siemens – Fuse EDA AI System for manufacturing and cybersecurity
These platforms use the model to automate complex workflows across telecom, cybersecurity, semiconductor design, and manufacturing.
Challenges in Multi‑Agent Applications
Context explosion
- Multi‑agent workflows can generate up to 15× more tokens than standard chat (each interaction resends full histories, tool outputs, and intermediate reasoning).
- The growing context inflates costs and can cause goal drift, where agents lose alignment with the original objective.
Thinking tax
- Complex agents must reason at every step.
- Using large models for every sub‑task makes multi‑agent systems expensive and sluggish.
How Nemotron 3 Super Addresses These Issues
- 1‑million‑token context window – agents can retain the entire workflow state in memory, dramatically reducing context‑related costs and preventing goal drift.
- Efficiency & openness – ranked #1 on Artificial Analysis for efficiency and openness, with leading accuracy among models of comparable size.
Benchmark Performance
- Powers the NVIDIA AI‑Q research agent, which holds the No. 1 spot on:
These benchmarks evaluate an AI system’s ability to conduct thorough, multi‑step research across large document sets while maintaining coherent reasoning.
Nemotron 3 Super sets a new standard for scalable, high‑performance multi‑agent AI.
Hybrid Architecture
Nemotron 3 Super employs a hybrid mixture‑of‑experts (MoE) design that blends three key innovations, delivering up to 5× higher throughput and 2× higher accuracy compared with the previous Nemotron Super model.
| Innovation | What it does | Benefit |
|---|---|---|
| Hybrid Architecture | Combines Mamba layers (for memory‑ and compute‑efficiency) with Transformer layers (for advanced reasoning) | 4× better memory & compute efficiency |
| MoE | Activates only 12 B of the total 120 B parameters during inference | Reduces compute cost while keeping model capacity |
| Latent MoE | Triggers four expert specialists for the cost of a single expert when generating the next token | Improves accuracy without extra latency |
| Multi‑Token Prediction | Predicts several future tokens in parallel | 3× faster inference |
NVIDIA Blackwell Platform
Runs in NVFP4 precision – this reduces memory usage and makes inference up to 4× faster than FP8 on NVIDIA Hopper, without any loss in accuracy.
Open Weights, Data, and Recipes
NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in data centers, or in the cloud.
Training Data and Methodology
- Trained on synthetic data generated using frontier‑reasoning models.
- NVIDIA publishes the complete methodology, including:
- 10 + trillion tokens of pre‑ and post‑training datasets.
- 15 training environments for reinforcement learning.
- Evaluation recipes.
Getting Started
Researchers can use the NVIDIA NeMo platform to:
- Fine‑tune the model.
- Build their own models and pipelines.
Use in Agentic Systems
Nemotron 3 Super is built to handle complex subtasks within multi‑agent systems.
- Software development – An agent can load an entire codebase into context at once, enabling end‑to‑end code generation and debugging without having to segment documents.
- Financial analysis – The model can ingest thousands of pages of reports in a single context, eliminating the need to re‑reason across long conversations and dramatically improving efficiency.
- High‑stakes tool calling – Nemotron 3 Super’s high‑accuracy tool‑calling lets autonomous agents reliably navigate massive function libraries, preventing execution errors in critical environments such as autonomous security orchestration for cybersecurity.
Availability
NVIDIA Nemotron 3 Super, part of the Nemotron 3 family, can be accessed through:
- NVIDIA platforms – build.nvidia.com, Perplexity, OpenRouter and Hugging Face.
- Enterprise integrations – Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face (optimized for on‑premise deployment on the Dell AI Factory). HPE is also adding NVIDIA Nemotron to its agents hub to support scalable enterprise adoption of agentic AI.
Cloud Service Providers
- Google Cloud Vertex AI
- Oracle Cloud Infrastructure
- Amazon Web Services (coming soon via Amazon Bedrock)
- Microsoft Azure
NVIDIA Cloud Partners
- CoreWeave
- Crusoe
- Nebius
- Together AI
Inference Service Providers
Data Platforms & Services
- Distyl
- Dataiku
- DataRobot
- Deloitte
- EY
- Tata Consultancy Services
The model is packaged as an NVIDIA NIM microservice, enabling deployment from on‑premises systems to the cloud.
Stay Up to Date
- Subscribe to the NVIDIA AI news newsletter
- Join the NVIDIA developer community
- Follow NVIDIA AI on:
Explore self‑paced video tutorials and livestreams on the NVIDIA AI YouTube playlist.