By using this site, you agree to the Privacy Policy and Terms of Use.
Accept

GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AINVIDIATech

NVIDIA adds MiniMax M2.7 to its AI stack for production-ready agents

MiniMax M2.7 is now live on NVIDIA, giving developers a 230B‑scale MoE model built to run serious, long‑running AI agents without crushing inference costs.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Apr 12, 2026, 2:36 AM EDT
Share
We may get a commission from retail offers. Learn more
MiniMax and NVIDIA partnership logos on black background with vertical divider
Image: NVIDIA
SHARE

NVIDIA just quietly flipped a pretty big switch: MiniMax M2.7, the latest in MiniMax’s agentic model lineup, is now live with open weights through NVIDIA’s ecosystem and the broader open-source inference stack. It’s the kind of move that doesn’t just add yet another LLM to the pile—it gives developers a serious, production-grade agent and coding workhorse they can actually run, tune, and scale on their own terms.

At its core, MiniMax M2.7 is a sparse mixture-of-experts (MoE) model tuned for long-running, tool-using agents rather than just chatty assistants. You’re looking at 230 billion total parameters, but only about 10 billion “active” per token, thanks to its MoE routing design—so you get big-model capacity without paying big-model inference costs every time you send a prompt. The architecture leans on multi-head causal self-attention, RoPE positional embeddings, and Query-Key RMSNorm, plus a top-k expert routing scheme that selectively activates just 8 of 256 experts per token at an activation rate of roughly 4.3%. In plain language: the model stays smart, but it’s picky about which parts of its brain it uses at any moment, which is why it can scale.

There’s also the context window, which is frankly huge: up to 200K tokens. That’s long enough to feed entire codebases, multi-step research traces, or dense technical documents into one session and still have room for the model to reason over them. For anyone designing autonomous agents, research pipelines, or AI dev tools, that kind of context isn’t a “nice to have”—it’s the difference between a toy assistant and something that can actually keep track of what it’s doing across long workflows.

MiniMax’s own positioning of M2.7 makes the intent clear: this isn’t just another generalist model—it’s explicitly marketed as capable of building and driving complex agent harnesses, including multistep productivity workflows, AI coding tools, and interactive environments. On internal and public benchmarks, they’re not shy about where they think it lands: MiniMax reports strong performance on complex skills over long prompts, a 97% “skill adherence” rate on tasks with more than 2,000 tokens, and substantial gains over M2.5 in real agent frameworks like OpenClaw. They also highlight that M2.7 approaches Anthropic’s Claude Sonnet-class performance on MMClaw-style evaluations, while remaining fully open-weights.

On the hard-numbers side, MiniMax cites serious coding and reasoning scores: on SWE-Pro, M2.7 lands just below frontier closed models like Claude Opus levels, and it extends that capability into full project delivery (VIBE-Pro) and terminal-style system understanding (TerminalBench 2). For devs, that translates to a single model that can not only write functions, but own end-to-end tasks: structuring a repo, wiring services, and iterating under feedback.

NVIDIA’s angle here is just as important as the model specs. With M2.7’s open weights now available through NVIDIA, the company is turning its AI stack into a kind of “reference highway” for serious open models. The model is integrated into multiple layers of that stack:

First is NVIDIA NemoClaw, an open-source reference stack focused on “always‑on” agents. NemoClaw sits on top of NVIDIA OpenShell, a secure runtime for autonomous agents that can call tools, hit endpoints, and run open models like M2.7 in a guarded environment. From a developer’s perspective, this matters because long‑running agent systems are a nightmare to wire up safely—NemoClaw gives you a one-command setup that provisions OpenClaw + OpenShell on NVIDIA’s Brev cloud GPU platform, so you can go from “reading about M2.7” to “actually running an agent” in minutes instead of days.

Then there’s the inference stack. NVIDIA has been systematically optimizing open MoE models, and M2.7 benefits from that work in vLLM and SGLang. The company and the open-source community collaborated to add high-performance kernels that specifically target MoE pain points. Two stand out:

  • A fused QK RMSNorm kernel, which combines computation and communication into a single pass—this cuts kernel launch overhead and memory operations and improves overall inference throughput.
  • An FP8 MoE kernel built on TensorRT-LLM, tuned for MoE routing and designed to squeeze more performance out of NVIDIA GPUs without tanking quality.

On NVIDIA Blackwell Ultra GPUs, those optimizations are not just theoretical. NVIDIA reports that on a standard 1K/1K input/output sequence dataset, they saw up to 2.5× throughput gains with vLLM and up to 2.7× with SGLang over a single month of tuning. That kind of rapid iteration means the “stack” around M2.7 is evolving almost as fast as the model lineup itself.

If you’re actually deploying this thing, NVIDIA’s made sure the serve story is straightforward. For vLLM, you can launch M2.7 with a standard CLI: set the model path to the MiniMax M2.7 weights, configure tensor parallelism (for example, --tensor-parallel-size 4), and enable expert parallelism alongside tool-call and reasoning parsers tuned specifically for MiniMax’s format. For SGLang, there’s a similar one-liner: you point to MiniMaxAI/MiniMax-M2.7, wire up tensor parallelism, memory fraction, batch size, FP8 quantization, and specify the MoE backend (flashinfer_trtllm_routed) plus the FP8 GEMM backend. In practice, this gives teams a choice: use vLLM’s more “generalist” serving framework, or SGLang’s agent-forward runtime that’s already popular in coding and tools-heavy setups.

On the access side, there are three main doors depending on how deep you want to go:

  • build.nvidia.com: NVIDIA is exposing MiniMax M2.7 as a free, GPU-accelerated endpoint through its Build portal. You can test prompts right in the browser, plug in your own data, and see how the model performs before committing to any infra decisions.
  • NVIDIA NIM microservices: when you’re ready to go beyond tinkering, the same model is packaged as an optimized NIM container—a production-ready inference microservice you can deploy on-prem, in the cloud, or in hybrid setups. The MiniMax M2.7 NIM container is tuned for NVIDIA GPUs and slots into the broader NIM ecosystem that now spans 100+ models with free-tier inference.
  • Hugging Face + direct weights: if you want full control, the open weights are available on Hugging Face via MiniMax’s official org, with support for frameworks like vLLM, SGLang, and NVIDIA NeMo. That’s the route you take if you’re building your own stack, doing custom sharding strategies, or experimenting with bespoke inference kernels.

For teams that don’t just want to run the base model, NVIDIA is also making sure post-training is a first-class path. Through the NVIDIA NeMo framework, you can use the NeMo AutoModel library to fine-tune MiniMax M2.7 with officially documented recipes and example configs. There are sample fine-tune configs for specific tasks (for example, Hellaswag-style reasoning), plus references to the latest checkpoints on Hugging Face, so you’re not starting from scratch.

If you’re chasing alignment or reward-shaped behavior, the NeMo RL library supports reinforcement learning on M2.7, including public recipes for different sequence lengths (like 8K and 16K sequence RL setups) and shared accuracy validation curves so you can sanity-check your runs. From a practical standpoint, that means you can do things like: teach M2.7 to be extra strict about tool use, optimize it for code reliability over raw creativity, or tune it for a specific internal evaluation suite.

Zooming out, the timing and positioning of MiniMax M2.7 on NVIDIA’s stack say a lot about where the ecosystem is heading. Over the past year, NVIDIA has aggressively expanded free and open-weight model access via NIM and Build, covering everything from DeepSeek to GLM 4.7 and earlier MiniMax M2.x releases. Now, with M2.7’s open weights and first-class support in vLLM, SGLang, NemoClaw, and NIM, there’s a clear pattern:

  • Agentic workflows are becoming “default,” not niche. M2.7 is explicitly engineered and marketed for agents, coding, and long-running workflows, and NVIDIA is shipping the surrounding runtime (OpenShell, NemoClaw, OpenClaw) to make those workloads sane to operate.
  • MoE is crossing from research into production. The throughput gains NVIDIA is showing on Blackwell Ultra with FP8 MoE and fused kernels make it plausible to run something with 230B parameters behind the scenes without needing hyperscaler-only budgets.
  • Open weights are becoming a competitive edge, not just a community checkbox. MiniMax is openly comparing M2.7 against frontier‑class proprietary models in benchmarks and still choosing to release it with open weights, while NVIDIA makes it trivial to deploy on its own hardware. For enterprises worried about lock-in, that combination—strong performance plus self-hosting options—is starting to look very attractive.

If you’re a developer, researcher, or a company building around AI agents, devtools, or complex productivity workflows, the takeaway is pretty simple: MiniMax M2.7 is now something you can actually put to work. You can:

  • Hit build.nvidia.com and try it in minutes via a browser endpoint.
  • Spin it up with vLLM or SGLang and ride the latest MoE kernel optimizations.
  • Wrap it in NemoClaw + OpenShell to experiment with always-on agents that can call tools and systems securely.
  • Fine-tune or RL-train it with NeMo if you want a domain-tuned variant for your own stack.

And because the weights are open, you’re not just renting the model—you can take it with you, dissect it, and keep evolving it alongside your own systems.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Leave a Comment

Leave a ReplyCancel reply

Most Popular

Perplexity’s Billion Dollar Build is a stress test for AI-native startup ideas

Google Gemini app now builds interactive 3D models and live charts

Perplexity and Plaid unite to bring all your money data into one smart view

Run smarter, pay less: Sonnet and Haiku tap Opus as a hidden advisor

Microsoft finally raises the FAT32 volume limit to 2TB in Windows 11 Beta

Also Read
0DIN AI Security Scanner dashboard with vulnerability metrics, scan statistics, remediation status, heat map analysis, and latest security reports

Mozilla open-sources 0DIN AI Security Scanner to expose hidden model vulnerabilities

Figma Weave design system interface showing an interconnected moodboard with diverse imagery including geological rock formations, pink flowers, tree bark textures, desert cacti, a sunset landscape, and a sculptural head form. Colorful connecting lines in cyan, purple, and pink with circular nodes create visual relationships between the disparate images against a dark background, demonstrating design asset organization and collaboration features

Five Figma Weave workflows that supercharge AI-powered design

Adobe Firefly generative fill interface displaying a series of image variations showing a cyclist riding through different seasonal landscapes. Left side shows green summer versions transitioning to snowy winter versions on the right, each featuring the same cyclist on a mountain road with varying terrain and weather conditions. At the bottom, a "Snow" slider control allows adjustment of the snow intensity across the variations. The Adobe Firefly logo appears in the top right corner against a teal gradient background

Adobe Firefly adds Precision Flow and AI Markup for smarter image edits

2026 2026 Samsung Bespoke Smart Slide-in Ranges and Bespoke Over-the-Range Microwave with Air Fry Max, Bespoke AI 3 Door French Door Refrigerator

2026 Samsung Bespoke AI fridge and range series now available

Acer Veriton GN100 AI Mini Workstation

Acer Veriton GN100 adds NemoClaw and Sense Pro for AI builders in New York

ASUS ZenMouse MD202 product display showing two wireless mice in different colorways—a dark grey/charcoal model on the left and a light grey/silver model on the right—positioned on textured geometric blocks in white, cork, and pink tones against a soft blue-grey background, highlighting the ergonomic oval design of the mice

ASUS ZenMouse MD202 debuts with premium Ceraluminum design

Google Slides to Video conversion interface showing an "Edit script and customize video" modal dialog. Left side displays a script panel with AI-generated narration for a Cymbal water bottle company presentation, featuring slide thumbnails (Cymbal logo, "Who we are" section, team diversity slide, testimonials, and market data visualization) paired with corresponding script text. Right side shows "AI voiceover" settings with a Narrator option (smooth, medium pitch) and a play button. Top includes a "Replace with speaker notes" link. Bottom has a "Rate this suggestion" section with thumbs up/down feedback options and a blue "Create the draft video" button

Google Vids now lets you edit AI scripts when converting Slides to video

Gmail app icon alamy

Gmail brings end-to-end encrypted email to Android and iOS for enterprise users

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.