Perplexity rolls out Agent API to orchestrate the full agentic loop

Perplexity is stepping into full-on “agent stack” territory with the launch of its new Agent API, pitching it as a managed runtime for building serious, production-grade agentic workflows without forcing teams to duct-tape half a dozen infrastructure pieces together. Instead of separately wiring up a model router, a search layer, an embeddings provider, a sandbox for code execution, and a monitoring setup, Perplexity wants you to hit one endpoint, bring your tools, and let its runtime orchestrate the whole loop.

At the heart of this launch is a specific view of what “agentic” actually means in practice. Traditional software runs like a CPU: fetch an instruction, decode, execute, store the result, repeat—no introspection, no decisions. The Agent API flips that model by making a frontier language model the “processor” that receives a high-level objective and decides how to get there—planning, choosing tools, calling them in sequence, looking at results, and iterating until the task is done. Perplexity frames this as an agent loop where the context window behaves like registers and the model’s reasoning and orchestration logic act as the scheduler.

Perplexity’s own example is very B2B: imagine prepping for a sales call. You send one request to the Agent API with three tools wired in—one for your internal CRM, plus web_search and fetch_url. The agent first queries your CRM to pull previous touchpoints, then hits web_search for recent news and competitive intel, and finally uses fetch_url to deeply read only the pages it deems worth a closer look. The idea is that, in a handful of steps, the model assembles internal history, fresh external context, and full-page detail into a single grounded output, without you writing a bespoke orchestration layer around each step.

That’s the big positioning difference versus the usual “we route between models” pitch. Perplexity is explicit that this isn’t just a model router; it’s a managed runtime for the entire agentic loop—retrieval, tool calls, reasoning, and multi-model fallback, plus any custom functions you hand it. Instead of juggling multiple vendors and services, you operate through a single account and API key, and you can still stay multi-provider under the hood. For teams building on top of OpenAI, Anthropic, or other frontier models, the Agent API exposes a model-agnostic interface with support for fallback chains: specify multiple models and the runtime automatically fails over if one is unavailable, aiming for near-100% uptime for production workloads.

Tooling is where the agent story gets practical. Out of the box, the Agent API ships with two core tools—web_search and fetch_url—that mirror what Perplexity uses internally to power its consumer product. web_search isn’t just a generic search call; it supports domain allowlists and denylists (up to 20 domains), recency filters, date ranges, language filters, and configurable content budgets so you can control how much of each page the agent reads. fetch_url, meanwhile, focuses on pulling and extracting full page content from specific URLs chosen by the agent, which is key in multi-step research or compliance-heavy flows where you need the model to actually read a document in depth. Beyond these, developers can register custom tools—functions that talk to internal backends, databases, or external APIs—so the same agent that searches the web can also hit your billing system, analytics warehouse, or CRM with structured calls.

Where Perplexity leans hardest into differentiation is in how it packages its own best practices as “presets.” Building a good agent config from scratch is nontrivial: you have to pick the right model, set reasoning depth, choose tools, and balance token budgets and latency. Perplexity points out that its internal evaluation team already does this tuning at scale for its own products, benchmarking configs against real-world workloads and external benchmarks like Google DeepMind’s DeepSearchQA and its own DRACO benchmark for deep research agents. Presets essentially expose that tuning: each one is a fully transparent, pre-configured setup optimized for a specific use case—fast factual lookups, balanced research, or heavy-duty institutional research.

For developers, the presets are meant to be a sensible starting point rather than a locked box. Each preset publishes its recommended system prompt, tools, step count, and cost profile, and you can override parameters like model choice, max_steps, or tools in a single request. The docs show examples where you keep the “pro-search” preset but swap the underlying model or increase step count for deeper reasoning, or restrict web_search to specific domains when you care about data provenance. Under the hood, some presets lean on Perplexity’s own models and some on third-party providers; for instance, deep-research and advanced-deep-research presets are tuned for complex multi-step analysis with web_search and fetch_url wired in by default.

Deep Research 2.0 is a big part of the story because it powers the advanced-deep-research preset in the Agent API, giving developers access to the same multi-step reasoning engine that fuels Perplexity’s consumer Deep Research feature. Perplexity says this engine can perform dozens of searches per query, read hundreds of source documents, and iteratively refine its answer, which is exactly the kind of pattern that benefits from an agentic runtime rather than simple single-shot calls. On external benchmarks, Perplexity has been loudly claiming state-of-the-art performance: Deep Research reportedly tops Google DeepMind’s DeepSearchQA leaderboard with around 79.5% accuracy, while also running as one of the fastest tools in its class. The company also co-authored DRACO, a cross-domain benchmark that measures deep research systems across 100 tasks spanning domains like law, medicine, finance, academic research, and more, focusing not just on accuracy but on completeness and objectivity.

That benchmarking push isn’t just about bragging rights; it’s part of Perplexity’s pitch that if you trust its agents with high-stakes research—think legal, financial, or medical workflows—you want evidence they can consistently reason over large, messy information spaces. By wiring Deep Research into the Agent API as a preset, Perplexity is effectively turning its consumer-facing superpower into an enterprise building block. Developers can now call the same multi-step engine via API, combine it with their own tools, and let Perplexity’s runtime handle the orchestration and search integration behind the scenes.

Zooming out, the Agent API slots into a broader platform move. Perplexity now surfaces an Agent API for agentic workflows, a Search API for raw, ranked web results with fine-grained filtering, and a Sandbox API for isolated code execution—each addressing a different layer of AI application development. The Agent API sits on top of these primitives and abstracts away complexity into one managed loop: objectives in, tools and models orchestrated automatically, and grounded answers out. Documentation, quickstart snippets, and preset catalogs are already live on docs.perplexity.ai, signaling that Perplexity wants this to be a first-class entry point for teams building agentic products—not just a sidecar to its consumer app.

For developers and companies, the appeal is pretty clear: instead of piecing together a DIY agent stack with separate vendors for search, models, routing, observability, and sandboxed execution, you plug into one multi-provider runtime that already knows how to run the agent loop well and has real-world performance data to back it up. The trade-off, as always, is how much control and custom infrastructure you’re willing to give up in exchange for speed, reliability, and an opinionated “just works” layer on top. But with the Agent API, Perplexity is clearly betting that a lot of teams would rather ship agentic workflows quickly on a managed foundation than reinvent the orchestration wheel themselves.