GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AITech

Moonshot AI says its new model beats GPT-5 — and it’s open for everyone

Moonshot’s new Kimi K2 AI model says it outperforms OpenAI and Anthropic’s best systems.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Nov 8, 2025, 8:28 AM EST
Share
We may get a commission from retail offers. Learn more
Illustrated image of artificial intelligence (AI)
Illustration by Kasia Bojanowska / Dribbble
SHARE

Moonshot AI, a Beijing-based lab that’s quietly been building an open-weight stack for the past two years, dropped a new model this week called Kimi K2 Thinking. The company says the model beats OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 on a handful of the hardest agent-style benchmarks currently used to judge “thinking” systems — and, crucially, the code and weights are available for anyone to use on Hugging Face. If the numbers hold up, the result would be a jolt to the industry narrative that top-tier capabilities must be locked behind expensive, proprietary APIs.

Moonshot describes Kimi K2 Thinking as a mixture-of-experts (MoE) reasoning model that activates about 32 billion parameters at inference but has roughly 1 trillion parameters total across its expert pool. It’s explicitly built to think with tools: the model can call web browsers and other utilities, chain hundreds of steps of tool use, and (Moonshot says) continually refine hypotheses as it goes. That combination — deep multi-step reasoning plus stable tool use — is the headline feature Moonshot is pushing.

On Moonshot’s own benchmark pages, K2 Thinking posts scores such as 44.9% on Humanity’s Last Exam (HLE) with tools, 60.2% on BrowseComp (a web-search + synthesis test), and strong marks on coding suites like SWE-Bench Verified. Those are the numbers being used to argue that it outperforms GPT-5 and Claude Sonnet 4.5 on certain agentic tasks.

Why the “open and free” bit matters

You don’t usually see headline-grade model releases that are genuinely open. Moonshot has published the model materials on Hugging Face and released technical write-ups and code, meaning developers can download weights, run experiments locally, fork the code — or embed the model in commercial products, subject to the model’s modified license. That availability matters: it lowers the barrier for startups and researchers who want to experiment with agentic workflows without paying steep per-token fees to a closed provider.

Moonshot also published a technical narrative arguing K2’s MoE design plus quantization and serving tricks let the lab train and run a trillion-parameter system far more cheaply than conventional dense models — a claim that, if true, challenges the idea that building frontier models requires “scale and burn” budgets in the hundreds of millions or billions.

The eyebrow-raising price tag (and the caveats)

Several outlets reporting on the launch cited a figure — $4.6 million — as the training bill for K2 Thinking. That number has been widely repeated, but it’s important to stress it isn’t independently verified: outlets that repeated the figure note it came via a “source familiar with the matter” and that CNBC, which reported it, could not independently confirm the number. If the figure is accurate, it would be staggeringly low compared with the public spending numbers we’ve seen from leading Silicon Valley labs — but the number should be treated with caution until more independent accounting appears.

What Kimi K2 actually looks like under the hood

The broad technical story is familiar to anyone following modern LLM engineering:

  • MoE architecture: many expert sub-modules; only a subset activate per token, so you can get a huge “parameter count” but relatively modest inference costs. Moonshot says K2 activates ~32B params per call while the model houses ~1T total.
  • Tool-first training: the model was post-trained and fine-tuned to plan, call tools (search, browse, APIs), verify results, and iterate — not just to produce a single text reply. Moonshot emphasizes long sequences of tool calls (200–300) as a core capability.
  • Long context: the model family has been pushed to enormous context windows (Moonshot advertises hundreds of thousands of tokens in some versions), which helps for sustained, multi-step tasks.

Those design choices match a broader industry trend: instead of trying to encode everything into a single “dense” network, engineers are stitching together specialist modules and cheap tool calls to get emergent agentic behavior without astronomical inference bills.

Why investors and businesses should pay attention

For businesses that have been sold pricey enterprise models under the argument “you get what you pay for,” a high-performing free alternative is a strategic headache. If an open model can match or beat a proprietary one on productivity tasks, the economic moat that supported subscription pricing narrows.

Investors will be watching two things: (1) whether K2’s real-world performance (outside of company-published benchmarks) matches the launch claims, and (2) whether Moonshot’s serving economics and license let startups build profitable services around the model without re-creating the expensive infrastructure stacks that firms like OpenAI and Anthropic run. Early signs — including the Hugging Face release and bench numbers — have already stirred conversation in trading desks and boardrooms.

The geopolitical and security angle

The new release also re-ignites familiar geopolitical anxieties. Western policymakers have tended to view advanced models from Chinese labs through lenses of control, censorship, or national advantage — sometimes rightly, sometimes not. Moonshot’s choice to open weights complicates the usual narrative: these models are now not just national trophies but engineering artifacts anyone can inspect, run, and reuse. That has pros (transparency, rapid innovation) and cons (easier proliferation of risky capabilities).

Security-conscious organizations will ask tougher questions: who trained the data, what red-teaming was done, how does the model handle disallowed content, and could it be adapted to misuse at scale? Those are not rhetorical — several governments reacted fast to similar releases earlier in the year, and security reviews will matter more now that agentic capabilities are becoming affordable.

So, is this a Sputnik moment or a flash in the pan?

There’s a reasonable middle ground. The Kimi K2 launch is an important data point: it shows that MoE + careful tool integration can push open models into territory that used to belong only to deep-pocketed proprietary teams. But the history of AI hype is long — claims need independent verification, third-party audits, and time in production to prove robustness.

If Kimi K2’s performance and cost story hold under scrutiny, expect three things to happen quickly: (1) more aggressive open-weight launches from other firms, (2) renewed pressure on proprietary vendors to justify their pricing or open parts of their stacks, and (3) faster conversations about how to regulate or audit agentic systems that can interact with the web and other services autonomously. If the numbers don’t hold, the launch will still have shifted perceptions — at minimum, it forces incumbents to explain why closed models remain worth paying for.

The practical takeaway

If you build software or manage AI procurement:

  • Try the model yourself (it’s available on Hugging Face and Moonshot’s platforms) and run the tasks you care about; benchmarks are a starting point, not a guarantee.
  • Treat the $4.6M training figure as unverified until clearer accounting appears. Don’t make budget decisions based purely on press numbers.
  • Keep an eye on license terms: “open” doesn’t always mean “no strings” — Moonshot’s release uses a modified MIT license with some commercial restrictions at scale. Read the legal fine print before embedding the model in revenue-generating products.

Final note

The AI landscape is changing fast. Kimi K2 Thinking is the latest example of that velocity: it raises hard, useful questions about cost, openness, and what “frontier” AI really means. Whether it ultimately reshapes the market or ends up as an overhyped milestone depends not on launch tweets, but on repeated, independent testing and real-world use. For now, the industry has a new model to probe — and a new argument to settle about whether the next big leap will be proprietary or open.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Leave a Comment

Leave a ReplyCancel reply

Most Popular

Gemini can now create images based on your own life

Linux developers get an official native Claude Desktop app

Google’s 2026 Environmental Report: A tougher road to net-zero

Google Meet updates bandwidth controls for smoother calls

You can finally use Ask Gemini in the Google Drive mobile app

Also Read
Illustration of digital security featuring a yellow password field with hidden characters, a black unlocked padlock, and a yellow key, representing password protection, authentication, encryption, and secure access to online accounts.

WPA3 explained: Protecting your network in a connected world

Illustration of a person sitting on large, three-dimensional Wi-Fi signal bars while using a tablet, symbolizing wireless connectivity and internet access, set against a bright blue background.

What actually is Wi-Fi?

A person carries the LG xboom Stage 501 portable Bluetooth party speaker by its built-in handle at an outdoor backyard gathering. The speaker features illuminated LED lighting and top-mounted controls while friends socialize in the background, highlighting its portable design for outdoor entertainment.

LG’s new xboom Stage 501 turns your living room into a karaoke bar

Screenshot of the Anthropic Claude Enterprise Analytics dashboard displaying organization-wide AI usage and cost metrics. The interface includes summary cards for weekly active members, pull requests created, cowork sessions, and total spending, along with an Analytics Chat panel and a line chart showing Claude usage trends over time. A sidebar provides navigation to analytics for Claude.ai, Claude Code, Cowork, Claude Tag, and Code Review.

Anthropic’s new admin tools bring discipline to AI spending

Screenshot of a Claude Code artifact viewer displaying a product analytics dashboard. The interface includes version comparisons, mobile UI mockups, conversion metrics, performance charts, and a sharing panel that allows users to distribute the latest artifact version through a shareable link.

Claude Code brings artifacts to Pro and Max users

Promotional graphic showcasing example WhatsApp usernames displayed as profile cards. Sample profiles include @AnnaAtWork, @QueenTrinity, @JonnyR, and @Katy_Paints, illustrating how usernames will appear alongside profile photos and display names. The WhatsApp logo appears in the lower-left corner.

The era of the WhatsApp username is finally here

Screenshot of Google Sheets displaying a spreadsheet with regional sales data and a newly imported 3D stacked column chart. The Chart editor panel on the right shows the chart type set to "3D Stacked column chart," with data for laptops, smartphones, and tablets grouped by region (East, North, South, and West).

You can now import 3D bar charts into Google Sheets

Google Drive logo featuring a triangular design with green, blue, and yellow segments on a light blue background.

Google replaces clunky Drive searches with AI Overviews on mobile

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.