If you’ve logged into Perplexity lately and you’re on a paid plan, you might have noticed a new name quietly appearing in your model picker: Kimi K2.5. On paper, it’s “a new state‑of‑the‑art open source reasoning model from Moonshot AI,” now wired directly into Perplexity’s Pro and Max tiers and served from Perplexity’s own inference stack in the US. In practice, it marks a pretty big shift: one of the most capable open‑source “thinking” models on the market is now sitting alongside the usual proprietary heavyweights, ready to handle your day‑to‑day research, coding, and multi‑step reasoning.
Kimi K2.5 comes from Moonshot AI, the China‑based startup behind the Kimi assistant, backed by big‑name investors like Alibaba and HongShan and founded by former Google and Meta researcher Yang Zhilin. Over the past year, Moonshot has been steadily pushing an interesting thesis: that open models can compete not just on raw capabilities but on long‑horizon “agency” — the ability to reason step‑by‑step, call tools repeatedly, and keep a coherent plan over hundreds of actions. K2.5 is the latest, and most ambitious, expression of that strategy. It’s a 1‑trillion‑parameter Mixture‑of‑Experts model with about 32 billion parameters active per token, trained on roughly 15 trillion mixed visual and text tokens on top of the earlier K2 base. That scale isn’t just for bragging rights; it underpins Kimi’s ability to juggle long documents, codebases, and image‑ or video‑heavy workflows without losing the thread.
Under the hood, Kimi K2.5 is natively multimodal. It doesn’t bolt vision on as an afterthought; it integrates a dedicated vision encoder called MoonViT with around 400 million parameters, designed to feed visual context directly into its language reasoning stack. That means it can read screenshots, UI mockups, charts, PDFs with diagrams, and even more complex visual inputs, then combine that with text and code in a single reasoning chain. The model exposes a 256K token context window in its reference implementations — far beyond the typical 32K caps users see in many consumer products — allowing it to hold books, multi‑file repositories, or long research trails in working memory. In the open tooling ecosystem, people are already running quantized versions locally that still preserve strong performance on coding and MMLU‑style academic benchmarks, despite shrinking the footprint dramatically.
What makes K2.5 especially interesting isn’t just that it “sees” and “codes,” but how it thinks. Moonshot positions it as a “thinking model” or “agentic model”: it reasons step‑by‑step, writes internal chains of thought, and can invoke tools in a stable way across 200–300 sequential calls in long‑horizon workflows. On synthetic and academic tests, that design shows up in the numbers: Kimi K2‑series models have set or matched state‑of‑the‑art results on benchmarks like Humanity’s Last Exam (HLE), BrowseComp, and VideoMMMU, often used to gauge deep reasoning, browsing‑based problem solving, and video understanding. In public write‑ups and early coverage, K2.5 is framed as outperforming leading proprietary systems from OpenAI and Anthropic on some of these agentic and video‑reasoning tasks, which is precisely where open‑source models have traditionally lagged. For developers and power users, that translates into a model that doesn’t just answer one question well, but can stay reliable over an entire multi‑step project.
The open‑source angle matters here. Kimi K2.5’s weights are released under an open license on platforms like Hugging Face and NVIDIA’s Build portal, which means researchers and companies can inspect, host, and fine‑tune the model on their own infrastructure. That transparency helps chip away at the “black box” problem that still plagues proprietary AI: in Moonshot’s ecosystem, even “thinking logs” — the internal reasoning traces — can be surfaced or analyzed, giving teams a way to audit how the model reached a conclusion. For enterprises with strict data‑governance requirements, the ability to run the same architecture locally or in a private cloud, while still having a managed SaaS experience through tools like Perplexity, is a compelling hybrid. And because the model is open, optimizations like INT4 quantization and ultra‑low‑bit GGUF variants arrive quickly from the community, making serious experimentation accessible to smaller teams as well.
Perplexity’s decision to integrate Kimi K2.5 and host it on its own inference stack is a clear statement about where the product is heading. Rather than funneling every query through a single vendor’s API, Perplexity is building what some users have dubbed a “model buffet,” where Pro and Max subscribers can choose between top‑tier proprietary models and frontier‑level open ones depending on the task. By running K2.5 on in‑house infrastructure in the US, Perplexity gets tighter control over latency, reliability, and data handling, which is increasingly important as AI tools become embedded in business workflows instead of being used just for ad‑hoc Q&A. It also creates room for Perplexity‑specific tuning — from safety filters to search orchestration — on top of the base model, without waiting on upstream changes from a third‑party provider. In other words, K2.5 isn’t just “yet another model option”; it’s raw open‑source capability injected into a tightly engineered retrieval and UX layer.
For paid users, the practical question is: when does it make sense to pick Kimi K2.5 over the usual suspects? If you’re doing heavy research with lots of documents, cross‑referencing sources, or building out long prompts with code and specs, K2.5’s long‑context, agentic design is a strong fit. It’s also compelling for workflows that blend visual assets and text — think auditing complex dashboards, reading slides, or turning mockups into code — especially as more frontends expose its full multimodal capabilities. Early community chatter suggests that while Perplexity may not expose the full 256K context in the UI, people are already using K2.5 for large research sessions, code generation, and comparison tasks alongside familiar models like Claude Sonnet 4.5, and treating it as another high‑end option to A/B test on tricky prompts. The bigger story, though, is that a world‑class open model is now part of the default toolkit for everyday users, not just something you run in a lab or a bespoke stack.
Taken together, Moonshot’s release of Kimi K2.5 and Perplexity’s rapid integration of it mark a turning point in how open‑source AI shows up in consumer‑facing products. The old dividing line — closed models for “serious” work, open ones for hobby projects — is eroding as open models start matching or beating closed systems in key reasoning and agentic benchmarks. By slotting K2.5 next to premium proprietary models and serving it from its own inference layer, Perplexity is effectively saying that users shouldn’t have to care whether a model is open or closed; they should just pick whatever solves the problem best. For power users on Pro and Max, that means more choice, more competition on quality and speed, and more room to align the tool to your own preferences — whether you’re deep‑diving 40 academic papers in one go or just trying to turn a messy slide deck into something coherent.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
