arxiv:2512.20856

NVIDIA Nemotron 3: Efficient and Open Intelligence

Published on Dec 24

· Submitted by

taesiri on Dec 25

NVIDIA

Upvote

Authors:

Abstract

We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance. Nano is released together with its technical report and this white paper, while Super and Ultra will follow in the coming months. We will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.

View arXiv page View PDF Add to collection

Community

taesiri

Paper submitter about 20 hours ago

Nemotron 3 introduces Mixture-of-Experts Mamba-Transformer with 1M context, LatentMoE, MTP layers, and multi-environment RL for agentic reasoning and tool use, with open weights.