New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

Published: (March 11, 2026 at 12:00 PM EDT)
5 min read

Source: NVIDIA AI Blog

NVIDIA Nemotron 3 Super

Launched today, NVIDIA Nemotron 3 Super is a 120‑billion‑parameter open model with 12 billion active parameters. It is built to run complex, agentic AI systems at scale, delivering advanced reasoning and high‑accuracy task completion for autonomous agents.

AI‑Native Companies

  • Perplexity – offers Nemotron 3 Super for search and as one of 20 orchestrated models in Computer.
  • Software‑development agents
    • CodeRabbit
    • Factory
    • Greptile
      These platforms integrate Nemotron 3 Super (alongside proprietary models) to achieve higher accuracy at lower cost.
  • Life‑sciences & frontier‑AI – Edison Scientific and Lila Sciences will power their agents for deep literature search, data‑science workflows, and molecular understanding.

Enterprise Software Platforms

Industry leaders deploying and customizing Nemotron 3 Super include:

  • Amdocs – telecom workflow automation
  • Palantir – data‑centric AI solutions
  • Cadence – semiconductor design assistance
  • Dassault Systèmes – product‑development pipelines
  • SiemensFuse EDA AI System for manufacturing and cybersecurity

These platforms use the model to automate complex workflows across telecom, cybersecurity, semiconductor design, and manufacturing.

Challenges in Multi‑Agent Applications

  1. Context explosion

    • Multi‑agent workflows can generate up to 15× more tokens than standard chat (each interaction resends full histories, tool outputs, and intermediate reasoning).
    • The growing context inflates costs and can cause goal drift, where agents lose alignment with the original objective.
  2. Thinking tax

    • Complex agents must reason at every step.
    • Using large models for every sub‑task makes multi‑agent systems expensive and sluggish.

How Nemotron 3 Super Addresses These Issues

  • 1‑million‑token context window – agents can retain the entire workflow state in memory, dramatically reducing context‑related costs and preventing goal drift.
  • Efficiency & openness – ranked #1 on Artificial Analysis for efficiency and openness, with leading accuracy among models of comparable size.

Benchmark Performance

These benchmarks evaluate an AI system’s ability to conduct thorough, multi‑step research across large document sets while maintaining coherent reasoning.

Nemotron 3 Super sets a new standard for scalable, high‑performance multi‑agent AI.

Hybrid Architecture

Nemotron 3 Super employs a hybrid mixture‑of‑experts (MoE) design that blends three key innovations, delivering up to 5× higher throughput and 2× higher accuracy compared with the previous Nemotron Super model.

InnovationWhat it doesBenefit
Hybrid ArchitectureCombines Mamba layers (for memory‑ and compute‑efficiency) with Transformer layers (for advanced reasoning)4× better memory & compute efficiency
MoEActivates only 12 B of the total 120 B parameters during inferenceReduces compute cost while keeping model capacity
Latent MoETriggers four expert specialists for the cost of a single expert when generating the next tokenImproves accuracy without extra latency
Multi‑Token PredictionPredicts several future tokens in parallel3× faster inference

NVIDIA Blackwell Platform

Runs in NVFP4 precision – this reduces memory usage and makes inference up to 4× faster than FP8 on NVIDIA Hopper, without any loss in accuracy.

Open Weights, Data, and Recipes

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in data centers, or in the cloud.

Training Data and Methodology

  • Trained on synthetic data generated using frontier‑reasoning models.
  • NVIDIA publishes the complete methodology, including:
    • 10 + trillion tokens of pre‑ and post‑training datasets.
    • 15 training environments for reinforcement learning.
    • Evaluation recipes.

Getting Started

Researchers can use the NVIDIA NeMo platform to:

  • Fine‑tune the model.
  • Build their own models and pipelines.

Use in Agentic Systems

Nemotron 3 Super is built to handle complex subtasks within multi‑agent systems.

  • Software development – An agent can load an entire codebase into context at once, enabling end‑to‑end code generation and debugging without having to segment documents.
  • Financial analysis – The model can ingest thousands of pages of reports in a single context, eliminating the need to re‑reason across long conversations and dramatically improving efficiency.
  • High‑stakes tool calling – Nemotron 3 Super’s high‑accuracy tool‑calling lets autonomous agents reliably navigate massive function libraries, preventing execution errors in critical environments such as autonomous security orchestration for cybersecurity.

Availability

NVIDIA Nemotron 3 Super, part of the Nemotron 3 family, can be accessed through:

  • NVIDIA platformsbuild.nvidia.com, Perplexity, OpenRouter and Hugging Face.
  • Enterprise integrations – Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face (optimized for on‑premise deployment on the Dell AI Factory). HPE is also adding NVIDIA Nemotron to its agents hub to support scalable enterprise adoption of agentic AI.

Cloud Service Providers

  • Google Cloud Vertex AI
  • Oracle Cloud Infrastructure
  • Amazon Web Services (coming soon via Amazon Bedrock)
  • Microsoft Azure

NVIDIA Cloud Partners

Inference Service Providers

Data Platforms & Services

  • Distyl
  • Dataiku
  • DataRobot
  • Deloitte
  • EY
  • Tata Consultancy Services

The model is packaged as an NVIDIA NIM microservice, enabling deployment from on‑premises systems to the cloud.

Stay Up to Date

Explore self‑paced video tutorials and livestreams on the NVIDIA AI YouTube playlist.

0 views
Back to Blog

Related posts

Read more »

What Is Agentic AI?

What Is Agentic AI? Agentic AI refers to AI systems that can take actions in pursuit of a goal rather than simply producing single responses. Capabilities of a...