GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AINVIDIATech

NVIDIA’s Nemotron 3 Ultra targets faster, cheaper long-running agents

NVIDIA’s Nemotron 3 Ultra takes the company beyond GPUs and into true frontier-model territory, with a 550B open-weight system built specifically for long-running, tool-using AI agents.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Jun 5, 2026, 9:00 AM EDT
Share
We may get a commission from retail offers. Learn more
Technology-themed illustration showing a glowing Earth emerging from a black background, surrounded by radiant golden data-like light trails extending outward. In the foreground, a series of floating interface panels display icons representing databases, task management, data analysis, artificial intelligence, and interconnected neural networks. A luminous green cube with connected nodes sits at the center, symbolizing AI infrastructure, large-scale computing, and global data ecosystems. The image conveys themes of machine learning, enterprise AI, cloud computing, and worldwide digital connectivity.
Image: NVIDIA
SHARE

NVIDIA’s new Nemotron 3 Ultra isn’t just another big language model announcement; it’s NVIDIA stepping squarely into the frontier-model arena with an open-weight system built from the ground up for long-running AI agents, not just chatbots. It’s a 550-billion-parameter Mixture-of-Experts model (with 55 billion parameters active at any given token) that NVIDIA is positioning as an open, high-speed reasoning engine for tooling-heavy workflows, coding copilots, research agents, and complex enterprise automations.

What makes Nemotron 3 Ultra interesting is not only its size, but what it represents: a GPU company that has spent years selling picks and shovels to the AI gold rush is now shipping its own frontier-class open model to run on that hardware, and signaling that “agentic” workloads are the next big battleground.

Nemotron 3 Ultra: NVIDIA’s open frontier move

Nemotron 3 Ultra sits at the top of NVIDIA’s Nemotron 3 family of open models, above earlier releases like Nemotron 3 Super, which itself was a 120B-parameter open MoE model for multi-agent systems. Ultra is the flagship: a “frontier-scale” model with 550B total parameters, 55B active, built around a hybrid architecture that mixes Mixture-of-Experts with Mamba and Transformer layers.

NVIDIA describes it as a general-purpose reasoning and chat model optimized specifically for demanding agent workloads: multi-step planning, tool calling, code and math reasoning, long-context document analysis, and orchestration across many sub-tasks. In other words, this isn’t just about answering a single prompt; it is meant for AI systems that think, plan, and act over hundreds of turns and tool calls.

The model is fully open weights and part of an “open ecosystem” push: NVIDIA is releasing not just weights but also training data recipes and fine-tuning workflows under an open license via the Linux Foundation, which gives enterprises and researchers a relatively permissive base to customize for their own stacks. For developers in the US and beyond who want top-tier intelligence without going fully proprietary, Nemotron 3 Ultra is clearly targeting that “open frontier” niche.

Built for long-running agents, not one-shot chats

If you look at how NVIDIA frames Nemotron 3 Ultra, the word that keeps appearing is “agent.” Traditional chatbots care mostly about single interactions: a user prompt, a response, maybe a few turns. Long-running agents are different. They need to hold on to context across many steps, call tools and APIs, write and debug code, query databases, read large docs, and then stitch everything together into a coherent plan.

Nemotron 3 Ultra is tuned precisely for that kind of workflow. NVIDIA says the model is optimized for long-context reasoning (up to around 1 million tokens), which means an agent can ingest huge codebases, multi-hundred-page PDFs, or extended conversation histories and still reason effectively. The design is also geared around tool-heavy behavior: coding agents, deep research assistants, enterprise workflow orchestrators, and EDA (electronic design automation) scenarios where agents need to reason stably across many steps without drifting or forgetting the bigger picture.

Under the hood, the model uses a hybrid Latent Mixture-of-Experts architecture with interleaved Mamba-2, MoE, and some Attention layers. That combination is meant to give you the best of three worlds: MoE for scale and efficiency, Mamba for long-sequence handling, and attention where it still matters most. NVIDIA emphasizes that only a subset of experts are active for each token (55B active out of 550B), which is how it keeps inference costs manageable while still having a massive capacity to draw on when needed.

From chips to full AI stacks

For years, NVIDIA has been the default choice for AI hardware in data centers, with GPUs like H100 and the newer Blackwell parts powering most leading models. Nemotron 3 Ultra shows that NVIDIA doesn’t just want to sell hardware; it wants to define the reference models that run on it, too. The model is tuned for NVIDIA’s NVFP4 precision format, which packs weights into a 4-bit floating point representation that can cut memory usage and improve throughput on Blackwell and Hopper GPUs.

NVIDIA claims Nemotron 3 Ultra can deliver roughly 5x higher throughput than comparable open frontier models, while lowering the cost of complex agent workloads by up to about 30 percent in some setups. That kind of performance-per-dollar story matters if you’re building agents that may run for hours, bouncing among tools, APIs, and internal services. It’s also a pointed message to cloud providers and AI startups: if you want the best performance on NVIDIA hardware, maybe NVIDIA’s own open model is the most optimized starting point.

Architecture, numbers, and what they mean in practice

The headline specs are straightforward: Nemotron 3 Ultra is a 550B-parameter model with 55B active parameters, hybrid LatentMoE + Mamba + Transformer architecture, 1M-token context, text-in/text-out, and support for multiple languages, including English and major global languages. It’s designed as a reasoning-first model, with a “think-then-answer” style where it generates an internal reasoning trace before producing the final user-visible response, similar to recent “chain-of-thought” oriented systems.

Benchmarks from Artificial Analysis suggest that Nemotron 3 Ultra is currently the most capable US open-weight model on their Intelligence Index, scoring 47.7, ahead of models like Gemma 4 31B, Nemotron 3 Super, and gpt-oss-120B, though still behind some Chinese open-weight efforts like Kimi K2.6. On their Agentic Index, Nemotron 3 Ultra also leads among open models, meaning its performance on agent-specific tasks, multi-step workflows, and instruction-following is especially strong.

At the same time, it’s not just about intelligence; speed is a core selling point. On BlackBox AI ahead of release, Nemotron 3 Ultra was measured at over 400 output tokens per second, which is impressive given that it’s more than 4x larger than some competitors it outpaces in throughput. That combination of “frontier-ish smarts” and high throughput is what puts it on the so-called Pareto frontier for speed vs performance: you can’t easily get significantly more intelligence without paying a clear speed penalty, or vice versa.

How it compares to other open models

Nemotron 3 Ultra doesn’t exist in a vacuum. It’s landing in an ecosystem that already has serious open contenders from Google (Gemma 4), other Nemotron variants, and large community-driven efforts. On Artificial Analysis’s rankings, it leads other US open-weight models in overall intelligence, but there are tradeoffs.

Here is a snapshot comparison based on currently available public data:

ModelParameters (total/active)Context windowStrengths (high level)
Nemotron 3 Ultra550B / 55BUp to 1M tokens Agentic reasoning, speed, long-running workflows 
Nemotron 3 Super120B / 12BUp to 1M tokens Multi-agent, efficient voice and conversation agents 
Gemma 4 31B~31BLong, but smaller than 1M (varies by deployment) Strong coding in its class, competitive reasoning 
gpt-oss-120B~120BLong-context (varies) Open frontier-style model, good all-rounder 

On coding benchmarks, for example, Gemma 4 31B apparently scores a bit higher than Nemotron 3 Ultra on the Artificial Analysis Coding Index, which suggests that if your top priority is pure code-completion quality in that test suite, Gemma may still be slightly ahead. But on broader reasoning and agent tasks, Nemotron 3 Ultra clearly leads among US open-weight releases at the moment, especially when you factor in speed.

Compared to Nemotron 3 Super, Ultra is a big jump up in scale and intelligence. Super, with its 120B parameters and 12B active configuration, was already pitched for multi-agent applications and voice agents, with strong performance on long conversations and reasoning tasks at lower cost. Ultra more or less takes that concept to the frontier level: more capable reasoning, larger context, and higher overall performance, albeit on beefier hardware.

Open weights, open recipes, and the Linux Foundation angle

One of the more surprising dimensions of this launch is how open NVIDIA is going with it. The company is releasing Nemotron 3 Ultra under an open license via the Linux Foundation, and the package is not limited to just weights. NVIDIA is also making training data recipes, fine-tuning workflows, and related tooling available, effectively offering a blueprint for how the model was built and how you can adapt it.

For enterprises in regulated sectors or those with heavy compliance requirements, that matters. Being able to run an open frontier-class model in your own VPC, on-prem cluster, or custom infrastructure, with the option to fine-tune on private data without sending anything to a third-party API, makes the model much more appealing. For the broader open-source and research community, it’s also a data point in the “can big vendors still meaningfully support open AI?” debate.

This approach lines up with the earlier Nemotron 3 Super launch, where NVIDIA emphasized high-efficiency, open MoE models that developers could run using vLLM, Hugging Face, and other standard stacks. With Ultra, NVIDIA is amplifying that story: their hardware, their precision formats, their inference libraries, their open model. It’s a vertically integrated pitch, but with enough openness to keep developers interested.

Performance, cost, and the NVFP4 story

A lot of the efficiency narrative around Nemotron 3 Ultra comes down to NVFP4, NVIDIA’s 4-bit floating point format that’s designed to offer a sweet spot between compression and numerical stability. By storing weights in NVFP4, NVIDIA can cram more of the model into GPU memory while still keeping quality high, which is crucial for a 550B-parameter system.

In practice, NVIDIA and early partners report that Nemotron 3 Ultra can hit high throughput – on the order of hundreds of tokens per second – and that it outperforms previous Nemotron models and many competitors in throughput while also delivering higher quality. NVIDIA claims up to around 5x higher throughput versus comparable open frontier models and as much as 30 percent lower cost for certain agentic workloads, assuming you’re running on the right NVIDIA hardware.

That does introduce a subtle lock-in question: yes, the model is open, but it’s tuned aggressively for NVIDIA GPUs. Partners like SGLang and Miles have already announced “day 0” support and show Nemotron 3 Ultra running efficiently on Blackwell and Hopper hardware with NVFP4 and BF16. If you’re already in the NVIDIA ecosystem, it’s very attractive. If you’re betting on alternative accelerators, you may not see the same advantages.

What this means for developers and enterprises

For developers, especially those in the US and Europe who want a powerful open-weight model for agents, Nemotron 3 Ultra is likely to become a default candidate alongside the usual closed-API giants. It gives you:

  • A frontier-scale reasoning engine with strong performance on multi-step and agentic tasks.
  • A 1M-token context window that can handle massive documents, logs, or codebases.
  • An open-weight, open-recipe release that you can fine-tune and deploy under your control.
  • Optimizations for NVIDIA hardware that can significantly reduce latency and cost if you already run on that stack.

For enterprises, the appeal is slightly different. Nemotron 3 Ultra can be slotted into existing RAG pipelines, LLM gateways, and agent orchestrators as the “brain” behind everything from helpdesk automation to financial analysis and EDA workflows. The open licensing and Linux Foundation alignment make it easier to pitch to legal teams that are wary of black-box proprietary APIs. And for companies that have already invested heavily in on-prem GPU clusters, Nemotron 3 Ultra is a way to extract more value from that hardware without being locked into a single LLM vendor.

Where this leaves the AI race

Zooming out, Nemotron 3 Ultra is another example of how the AI race is no longer just about who has the biggest proprietary model. We’re now seeing a clear open frontier tier emerge: extremely capable open-weight models, backed by major players, that try to balance quality, speed, and deployment flexibility. NVIDIA now has one of the strongest entries in that tier, at least in the US context.

It also underscores the shift from single-shot chat to agentic systems as the next phase of LLM adoption. NVIDIA is not just saying “this model can chat”; it’s saying “this model is the orchestration engine for fleets of agents that can plan, reason, and act over long stretches of time.” That framing aligns with how many dev teams are now thinking: tools, APIs, workflows, and “do something useful,” not just “answer this question.”


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Leave a Comment

Leave a ReplyCancel reply

Most Popular

Apple starts age verification in Texas

Perplexity’s AI “Personal Computer” steps onto Windows desktops

iOS 27 rumored to skip four older iPhone models

Here are all the winners of Apple’s 2026 Design Awards

Anthropic opens Project Glasswing to 150 new global defenders

Also Read
Screenshot of a ChatGPT interface displaying a drafted email in a document-style editor. The email is addressed to a repair service regarding a dishwasher leak and resulting cabinet damage, requesting a repair appointment. Editing and sharing controls appear at the top of the document, including a prominent pink “Send” button. The interface features a sidebar with navigation icons, a prompt input field at the bottom, and a blue-green gradient background surrounding the application window, illustrating AI-assisted email drafting and communication.

Draft it, tweak it, send it: ChatGPT adds native email sending

ChatGPT Memory summary modal showing a personalized overview of a user’s work, hobbies, travel interests, and community involvement, with options to correct or dismiss specific details.

OpenAI’s “Dreaming” update makes ChatGPT actually remember you

Logo featuring a stylized orange asterisk-like symbol followed by the word 'Claude' in bold black serif font on a light beige background.

Claude Cowork usage limits doubled on all paid plans for the next month

Close-up screenshot of an AI model selection menu displaying several large language models, including Claude Sonnet 4.6, Claude Opus 4.8, and Nemotron 3 Ultra. The Nemotron 3 Ultra option is highlighted and selected, marked with a checkmark and a “Max” badge, while a large cursor points toward the model name. The interface emphasizes choosing an advanced AI model within a chatbot or AI platform.

Nemotron 3 Ultra rolls out to Perplexity Pro, Max, and Computer

Illustration of two abstract hands on a pink background holding a cluster of white geometric shapes — a triangle, square, circle, and diamond.

Anthropic tightens its Claude Partner Network with tiers and a hub

Promotional graphic showing an AI chat prompt interface against a blue gradient background. The prompt asks: “Use my Function Health lab results to analyze changes in my vitamin D levels over time and build a dashboard showing trends, progress, and how each result compares to optimal ranges.” Tool chips labeled “Computer” and “Function Health” appear below the prompt, alongside an “Orchestrator” label, microphone icon, and send button, illustrating AI-assisted health data analysis and personalized wellness insights.

Perplexity’s health push connects Apple Health, Function labs, and other sources into Computer

Conceptual illustration showing a person seated in an armchair within a dark, dreamlike landscape, watching a glowing upward-trending financial chart projected across the scene. Above the chart floats a transparent sphere containing a computer icon, illuminated by beams of light from above. The image combines elements of technology, artificial intelligence, investing, and market analysis, symbolizing the use of AI-powered tools to monitor trends, research data, and support financial decision-making.

Perplexity’s Main Street AI push arrives with $250 credits per business

Conceptual technology-themed illustration featuring an open book on a desk beneath floating translucent digital panels and a glass sphere containing a computer icon. Blurred light streaks, code-filled interface windows, and layered geometric frames hover above the book against a dark background, symbolizing the intersection of knowledge, artificial intelligence, computing, and digital research. A pen rests beside the book, reinforcing themes of learning, analysis, and human-computer collaboration.

Perplexity Computer now decides what runs local vs cloud

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.