Nemotron 3 Ultra now available on AI Gateway
Source: Vercel Blog
Overview
Nemotron 3 Ultra from Nvidia is now available on Vercel AI Gateway. It is an open Mixture-of-Experts reasoning model built for orchestrating long‑running agent workflows, featuring a 1 M token context window. The model targets multi‑turn agent workflows such as planning, tool use, sub‑agent delegation, and error recovery. Throughput reaches up to 350 tokens per second, with up to 30 % lower cost on agentic tasks.
Usage
To use Nemotron 3 Ultra, set the model to nvidia/nemotron-3-ultra-550b-a55b in the AI SDK:
import { streamText } from 'ai';
const result = streamText({
model: 'nvidia/nemotron-3-ultra-550b-a55b',
prompt: 'Plan and run a multi-step research task and synthesize a report.',
});
AI Gateway Features
- Unified API for calling models, tracking usage and cost, and configuring retries, failover, and performance optimizations for higher‑than‑provider uptime.
- Built‑in custom reporting.
- Zero Data Retention support.
- Dynamic provider sorting by latency and cost (details).
- Reflects provider pricing with no markup and no platform fee on inference, including on Bring Your Own Key (BYOK) requests.
Resources
- Learn more about AI Gateway.
- View the AI Gateway model leaderboard.
- Try it in the model playground.