Nvidia’s best model is now live

Published: (June 4, 2026 at 12:12 PM EDT)
2 min read

Source: The New Stack

After pre-announcing Nemotron 3 Ultra, a 550-billion-parameter open-weight mixture-of-experts model, at Computex, Nvidia on Thursday released the model on platforms like Hugging Face, ModelScope, OpenRouter (with a free endpoint), and build.nvidia.com.

The new model uses the same latent mixture-of-experts technique and Mamba 2 architecture as the other models in the Nemotron 3 family, bringing the number of active parameters down to 55 billion. It can support context windows of up to 1 million tokens.

As Nvidia notes, the new model has been tuned to power long-running agents that need to plan, call tools, and iterate over complex tasks. For this, the model needs to be not just smart enough but also fast enough. Indeed, Nvidia is emphasizing speed with this release, noting that it is significantly faster than its previous generation of models.

Given the current concerns around token costs, what may matter more here is that Nvidia also claims the model could save users up to 30% compared to similarly powerful models.

Credit: Nvidia

While it is the fastest model among its direct competitors like Kimi-K2.6, Qwen-3.5, and GML-5.1 — and the best U.S. open-weight model yet — it does still trail the best of these Chinese models on most benchmarks, even if only by a few points.

And while Nvidia calls this a frontier model, the benchmarks don’t quite tell this story. On GDPVal, which tests how well a model performs real-world, economically valuable tasks, Nemotron 3 Ultra — in its NVFP4 variant, which uses Nvidia’s new quantization-aware pre-training technique — scores 47.9%. By comparison, OpenAI’s GPT-5.5 scores 84.9%.

Credit: Nvidia

Benchmarks don’t always capture a model’s strengths, though, and Nvidia notes that the model can handle “the orchestration and hardest reasoning calls in an autonomous workflow: architectural decisions in long-running coding sessions, synthesis across hundreds of research sources and verification across thousands of interdependent constraints.”

Credit: Nvidia

The model was trained on a curated dataset of 14.8 trillion tokens, enabling it to support 12 languages (English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Brazilian Portuguese, and Chinese) and 43 programming languages.

Nvidia is making the weights, datasets, and training recipes available. The model is available under the OpenMDW-1.1 license.

		TRENDING STORIES		

	




	




	
		
			YOUTUBE.COM/THENEWSTACK
		
		
			Tech moves fast, don't miss an episode. Subscribe to our YouTube 
			channel to stream all our podcasts, interviews, demos, and more.
		
	
	
		
			SUBSCRIBE
		
	



	
Group
Created with Sketch.
0 views
Back to Blog

Related posts

Read more »