Nemotron 3 Ultra now available on AI Gateway

Published: (June 4, 2026 at 03:00 AM EDT)
1 min read

Source: Vercel Blog

Overview

Nemotron 3 Ultra from Nvidia is now available on Vercel AI Gateway. It is an open Mixture-of-Experts reasoning model built for orchestrating long‑running agent workflows, featuring a 1 M token context window. The model targets multi‑turn agent workflows such as planning, tool use, sub‑agent delegation, and error recovery. Throughput reaches up to 350 tokens per second, with up to 30 % lower cost on agentic tasks.

Usage

To use Nemotron 3 Ultra, set the model to nvidia/nemotron-3-ultra-550b-a55b in the AI SDK:

import { streamText } from 'ai';

const result = streamText({
  model: 'nvidia/nemotron-3-ultra-550b-a55b',
  prompt: 'Plan and run a multi-step research task and synthesize a report.',
});

AI Gateway Features

  • Unified API for calling models, tracking usage and cost, and configuring retries, failover, and performance optimizations for higher‑than‑provider uptime.
  • Built‑in custom reporting.
  • Zero Data Retention support.
  • Dynamic provider sorting by latency and cost (details).
  • Reflects provider pricing with no markup and no platform fee on inference, including on Bring Your Own Key (BYOK) requests.

Resources

0 views
Back to Blog

Related posts

Read more »

Updates to Legal Terms

The proliferation of agentic workflows means developers now regularly grant AI tools direct access to their infrastructure, use services that act autonomously,...