Perplexity Computer now decides what runs local vs cloud

For years, “AI on your device” mostly meant a marketing slide. Now Perplexity is about to turn it into actual infrastructure – and not just for autocomplete or photo filters, but for full-blown agentic workflows that live partly on your machine and partly in the cloud, without you having to babysit where anything runs.

That is what hybrid agentic inference coming to Perplexity Computer really means.

If you’ve been following the recent wave of “agentic AI” buzzwords, you know a lot of it sounds abstract until you see it tied to real work. Agentic AI, in plain terms, is software that can understand goals, break them into steps, call tools, and adapt as it goes, instead of waiting for a single prompt and returning a one-shot answer. Perplexity Computer already leans into that idea: you describe an outcome – “clean up this 80-page deck and prep a summary for my team,” or “monitor this data source and update a weekly report” – and it spins up sub-agents that research, draft, cross-check, and iterate across your apps and services.

Today we're announcing that hybrid agentic inference is coming to Perplexity Computer.

Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency.

Coming soon. pic.twitter.com/6t3PrmI1FX
— Perplexity (@perplexity_ai) June 2, 2026

What was missing until now was a smarter way to place all that compute. Today, most agentic systems live almost entirely in the cloud, often blind to what’s actually on your laptop, or they force you to manually choose between “local mode” and “cloud mode” before you even start. Perplexity’s new hybrid local-server inference orchestrator is the bridge between those worlds: it reasons, step by step, about which parts of a task should run on your device and which should be escalated to big frontier models in the data center.

In other words, instead of you deciding where the AI runs, the AI decides that for you.

Perplexity calls the product tier that lives in your browser and on your infrastructure Perplexity Computer – a general-purpose digital worker that can coordinate long-running workflows, hand off sub-tasks to specialized models, and operate real tools like browsers, file systems, and SaaS apps. The next step in that story was Personal Computer, a client that runs on your own machine – first on Mac, now rolling out on Windows – tying your local files and native apps into that orchestration fabric.

Hybrid agentic inference is the layer that makes Personal Computer feel less like “a client” and more like part of the same brain. Instead of treating your laptop as a dumb terminal to the cloud, it treats it as another compute node with its own strengths: low-latency access to your data, strong privacy by default, and a growing ability to run surprisingly capable language models locally. When you give Perplexity Computer a job, the system now has three dimensions to think about at once: which model to use, which tool to call, and where that model should actually run.

That last question sounds simple, but it sits at the heart of the current AI infrastructure shift. Cloud models are still where you go for the largest parameters, the longest context windows, and the most exotic modalities. On-device models are smaller but increasingly competent, especially for pattern-heavy work like classification, routing, summarizing familiar content, and basic reasoning. The interesting part is not picking one or the other, but letting them cooperate inside a single agent.

Perplexity’s own blog frames this move with a pretty bold line: “The data center moves to your machine.” Internally, that means the hybrid orchestrator doesn’t just juggle tasks between one big model and another. It makes a real-time judgment call on each step in a workflow: is this step sensitive, simple, or both? Then it either keeps the work ground-side, on your device, or sends it up to a frontier model in the cloud.

Take a concrete example. Say you’re in finance or healthcare in the US, and you have a folder full of confidential spreadsheets or patient reports on your laptop. Historically, if you wanted AI to help, you either had to upload everything to a provider you hoped was compliant or you had to settle for a limited local model that couldn’t tap into the very best capabilities hosted in the cloud. With hybrid agentic inference, Perplexity can run a compact model locally that inspects and reasons about the files, then decides which parts of the job can safely be sent to the server and which should never leave your machine.

Maybe you ask for a summary of trends across thousands of rows of sensitive financial data. The local model can process that data in place, generate anonymized aggregates, and only send those aggregates to a frontier model for more nuanced narrative explanation. The same idea applies to health records, internal legal documents, or personal archives: sensitive bits stay anchored to your device unless there’s a compelling reason to move them. For regulated industries, that’s not just a nice-to-have design choice – it’s a compliance requirement.

Crucially, all of this happens automatically. You’re not digging through settings menus toggling “local only” or “cloud only” for each prompt. The orchestration logic runs with every request, slicing tasks into pieces and routing them accordingly.

Zoom out and you can see why this matters now. On the hardware side, the last three years have radically changed what “local inference” looks like. What used to be a toy demo on a smartphone – a small, laggy language model – has become billion-parameter-class models running in real time on modern laptops, desktops, and high-end phones. Apple, Google, and others have been quietly stacking the groundwork with neural engines, NPUs, and software stacks for on-device AI, from Apple’s on-device foundation language models to Gemini Nano on Android.

Google, for instance, now positions Android’s AI stack as explicitly hybrid: on-device Gemini Nano models for offline summarization and accessibility features, tied to cloud Gemini models when you need something heavier. Apple’s own research highlights improved reasoning and tool-use in its on-device and server models, again reflecting this idea that “local” is no longer just a second-class citizen. Perplexity’s move slots neatly into that broader trend, but with a twist: instead of just offering on-device features inside a single product, it is letting an agentic system orchestrate both local and cloud resources dynamically.

That agentic part is important. Companies from IBM to Red Hat have been talking up “agentic AI” as the way to scale automation: systems that can pursue goals through sequences of actions, call external tools, and adapt as new information arrives. But most of those discussions focus on model selection and tool selection in the cloud – which model handles which sub-query, which API is best for which job. Perplexity is adding compute placement to that same decision loop.

So instead of just asking, “Should this sub-task go to a code-optimized model or a research-optimized model?”, Perplexity Computer can now also ask, “Should this run on your CPU/GPU locally or on a remote accelerator in a data center?”

This shows up most clearly when you look at Perplexity Computer’s architecture. At the top is a core reasoning engine that handles goal-breaking, planning, and delegation. Underneath it sits a pool of specialized models – research-heavy models for deep web work, fast models for lightweight tasks, multimodal models for images and video – all orchestrated as sub-agents that can run for hours if needed. And beneath that, now, there is an expanded substrate: your own machine, plus Perplexity’s servers, treated as one distributed system.

Perplexity’s earlier announcements described Computer as model-agnostic, already routing work between models like Gemini, Grok, and ChatGPT depending on what each step required. Hybrid agentic inference extends that logic down into the physical layer. A sub-agent that needs high-precision reasoning over large, non-sensitive datasets might be scheduled on a cloud model with a huge context window. Another sub-agent that needs to continuously watch a folder on your Mac mini or Windows workstation can run locally, leveraging your hardware around the clock.

This is where the idea of “the data center moves to your machine” stops being metaphor and becomes an ops story. If meaningful chunks of your AI workloads can run on endpoints, you can reduce pressure on centralized infrastructure and potentially shift costs and performance characteristics in interesting ways. It also hints at a future where your personal machines – whether that’s a home desktop, a work laptop, or even a dedicated mini-PC – act as persistent agent hosts, continuously running Perplexity workflows against your local environment while cloud resources come and go as needed.

From a user’s point of view, the experience will feel less like managing “an AI product” and more like delegating work to a coworker that just happens to live inside your computer. Personal Computer can already read and write across local files, operate native apps, and tie into SaaS tools like Gmail, Slack, GitHub, Notion, and Salesforce. Hybrid agentic inference gives that coworker common sense about what should stay in-house.

If you’re a US-based knowledge worker with a Windows tower under your desk and a mess of local PDFs, screenshots, and CSVs, you no longer have to wonder which of those documents you’re comfortable sending to the cloud every time you want AI help. The orchestrator can keep sensitive material on device by default and only escalate distilled or anonymized representations to the server when necessary. For a tech-savvy audience that has been skittish about handing raw internal data to external providers, that is a tangible shift.

It also matters for latency and reliability. Local inference cuts the roundtrip time and gives you resilience when your connection is flaky, while server inference still covers the extreme cases when you need the biggest models. Google already uses this pattern with features like on-device summarization in Pixel’s Recorder app, backed by cloud services for heavier tasks. Perplexity is effectively applying the same logic to a much broader agentic workload that spans your whole software stack.

On stage at Computex 2026, Perplexity demonstrated this hybrid system running live alongside Intel’s latest Core Ultra Series 3 silicon, with CEO Aravind Srinivas using the Personal Computer agent to process confidential deal materials without sending everything to the cloud. The demo was less about benchmark numbers and more about the narrative: here’s an AI agent that can make nuanced decisions about where your data goes, in real time, based on content and context.

That kind of showcase is aimed squarely at the nervous middle – everyone who wants frontier-level AI help but lives in industries where compliance teams, regulators, or even just common sense have slowed adoption. Perplexity’s argument is that hybrid agentic inference lowers that barrier by design: the system is built around the idea that some work should never leave the machine, and that the right place for compute is a decision the agent can own.

At the same time, this is also a competitive play. Cloud providers and hyperscalers are talking about distributed inference, multi-model routing, and hybrid AI infrastructures that span on-prem data centers and public clouds. Perplexity is pushing that same logic down into the personal computer layer, betting that the future of AI assistants looks less like a single chat box in a browser and more like a fabric of agents living both in the cloud and on the endpoint.

All of this raises an obvious question: how far can on-device models really go? The honest answer is that frontier models are not going away. If you want the cutting edge of reasoning, creativity, or multi-modal understanding, you will still lean on giant models running in specialized data centers. But the range of tasks that can be handled locally is growing quickly. Recent analyses of on-device LLMs point out that what looked impossible on consumer hardware a few years ago – real-time generation and reasoning from billion-parameter models – is now not only possible but increasingly practical on flagship devices.

Perplexity’s hybrid agentic inference basically rides that curve. As local models get better, the balance shifts: more of your workflows can stay on the device, with the orchestrator quietly updating its routing decisions. In a few years, the line between “local” and “cloud” might feel as invisible as the line between RAM and disk storage does today – something the system manages on your behalf, while you just see the outcome.

In the meantime, this is a clear signal that AI infrastructure is moving closer to the edge, not just in industrial IoT or smart cities, but in everyday personal and professional computing. For US-based users who care about both performance and privacy, the arrival of hybrid agentic inference in Perplexity Computer is a sign that you might not have to choose between them for much longer.

Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

GadgetBond

Perplexity Computer now decides what runs local vs cloud

Discover more from GadgetBond

Leave a ReplyCancel reply

The biggest announcements from Samsung’s London Galaxy Unpacked

Samsung’s new Z Flip8 and Fold8 are open for preorder with fresh retailer incentives

The day OpenAI’s experimental model broke out of its security sandbox

Samsung Galaxy Z Fold8 Ultra, Fold8, and Flip8 arrive with advanced AI

How to create standalone apps from any web page

Tired of reading? Here is how to make Chrome read your favorite websites aloud

How to turn Google Chrome’s spelling tools on or off

How to move your Chrome address bar to the bottom on mobile

Clean slate browsing: here’s how Chrome’s Guest profile works

Apple Maps is finally coming built-in to Ford’s next-gen EVs

The Morning Show sets the stage for its 2027 farewell

Anthropic open-sources its AI labor data inside Claude

OpenAI launches Presence to bring guardrails to autonomous agents