Anthropic just showed devs how to stop overpaying for Opus-level AI

Anthropic is quietly changing how we think about AI agents, and it’s doing it with a deceptively simple idea: let a cheaper model drive, and bring in the genius only when you really need it.

Instead of running Claude Opus end-to-end for every task — which gets expensive fast — Anthropic is now pushing what it calls the “advisor strategy”: pair Opus as a behind-the-scenes advisor with Sonnet or Haiku as the executor, and you get near Opus-level intelligence at Sonnet-like prices.

Here’s how it actually works in practice. Your agent runs on Claude Sonnet or Haiku, which does everything the user sees: calling tools, browsing, reading results, iterating, and writing the final answer. But when that smaller model hits a genuinely hard decision — think tricky reasoning, complex planning, or ambiguous context — it silently “escalates” to Opus via a new advisor tool built into the Claude Platform. Opus doesn’t talk to your user, doesn’t call tools, and doesn’t try to solve the entire task; it just reads the shared context and sends back a plan, correction, or stop signal so the executor can continue.

What Anthropic is doing here is flipping the usual pattern on its head. Traditionally, teams set up a big orchestrator model that decomposes a task and hands chunks to smaller worker models. Anthropic’s advisor strategy does the opposite: the smaller, cheaper model is in charge, and it only pulls in frontier-level reasoning from Opus when absolutely necessary. That means your default cost profile is Sonnet or Haiku, and only a sliver of tokens are billed at Opus rates — just enough to unlock its decision-making and planning.

Under the hood, this is all wired through a new server-side advisor tool that Sonnet and Haiku “know” how to use. In a typical API call, you declare advisor_20260301 as a tool, specify Opus as the advisor model, and let the executor decide when to invoke it. The entire exchange — executor calling the advisor, Opus reading curated context, sending back a plan, and Sonnet continuing — happens within a single /v1/messages request, so you’re not juggling extra round-trips or complex context management yourself.

Anthropic is backing this up with numbers. In its internal evaluations, Sonnet with an Opus advisor shows a 2.7 percentage point gain on SWE-bench Multilingual compared to Sonnet alone, while actually reducing cost per agentic task by about 11.9%. On BrowseComp and Terminal-Bench 2.0, the same pattern holds: higher scores with lower cost per task than running Sonnet solo. In other words, you’re not paying extra just for a bit more quality — in some workloads, you’re actually saving money and gaining accuracy at the same time.

The story gets even more interesting with Haiku. On BrowseComp, Haiku with an Opus advisor jumps to 41.2%, more than double Haiku’s solo score of 19.7%. It still trails Sonnet solo in raw performance, but Anthropic says it costs around 85% less per task, which puts it squarely in the “high-volume, good-enough-but-smart” sweet spot. For large workloads — think customer support triage, large-scale content operations, or bulk research agents — that trade-off is extremely attractive: frontier-flavored intelligence at a fraction of the price.

Crucially, Anthropic has wired in cost controls from the start. You can set max_uses to cap how many times the advisor can be called per request, and advisor tokens are reported separately in the usage block, so teams can watch exactly how much they’re spending on Opus versus the executor. Because Opus typically only produces a short plan of ~400–700 tokens, and the heavier output is generated by Sonnet or Haiku at their lower rates, the total bill lands well below what you’d pay for Opus end-to-end.

For developers already using tool-augmented agents, Anthropic is trying hard not to break anything. The advisor tool is “just another tool” in the Messages API: you can combine it with web search, code execution, and other tools in the same loop. The executor can browse the web, run code, call custom tools, and consult Opus — all inside a single, coherent agent run. That makes the advisor strategy feel less like some niche feature and more like a new default pattern for building serious agents on top of Claude.

Anthropic is also leaning on early customer feedback to sell the idea. Genspark reports “clear improvements in agent turns, tool calls, and overall score — better than a planning tool we built ourselves.” Eve Legal says Haiku 4.5 with an Opus 4.6 advisor can match frontier-model quality at around 5× lower cost on structured document extraction tasks. And at Bolt, the team notes that the advisor setup “makes better architectural decisions on complex tasks while adding no overhead on simple ones” — the plans and trajectories, they say, are “night and day different.” Taken together, these quotes paint a picture of teams that experimented with their own planning and orchestration layers, then quietly retired them once the advisor tool started outperforming their custom logic.

From a developer’s point of view, getting started is fairly lightweight. The feature is available in beta on the Claude Platform, and Anthropic outlines a three-step flow: add the beta header anthropic-beta: advisor-tool-2026-03-01, declare advisor_20260301 in your Messages API request, and adjust your system prompt for your specific use case (such as coding agents). Anthropic even recommends a simple evaluation recipe: run your existing eval suite against Sonnet solo, Sonnet + Opus advisor, and Opus solo to see the quality–cost trade-offs in your own environment.

Strategically, this move says a lot about where Anthropic thinks the market is going. Instead of forcing teams to choose between “cheap but dumb” and “smart but expensive,” the advisor strategy creates a middle lane: “smart when needed, cheap by default.” For many organizations that are watching their token bills but still want frontier-level reasoning in critical paths, that’s exactly the sort of lever they’ve been asking for.

It also marks a subtle shift in how we talk about AI capabilities. The question is no longer just “Which model is the smartest?” but “How do you route the right intelligence to the right part of the task at the right price?” Anthropic’s answer is to make that routing automatic and model-native: Sonnet and Haiku know when they’re stuck and when it’s time to ask Opus for help, instead of relying on brittle, hand-coded orchestration trees.

For teams building production agents, this could become a new default architecture: pick Sonnet when you want strong general performance, bolt Opus on as an advisor to cover the hardest reasoning, and drop down to Haiku + Opus when scale and unit cost are your main constraints. You get Opus-level guidance in the moments that matter, without paying Opus-level prices for every token.

Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

GadgetBond

Run smarter, pay less: Sonnet and Haiku tap Opus as a hidden advisor

Discover more from GadgetBond

Leave a ReplyCancel reply

MacBook Neo is so popular that it’s now a massive problem for Apple

Perplexity’s Billion Dollar Build is a stress test for AI-native startup ideas

OpenAI Codex loses six older models in spring cleanup

Perplexity and Plaid unite to bring all your money data into one smart view

Anthropic’s Project Glasswing could reshape how software is secured

Google Finance’s AI upgrade goes global in 100+ countries

Google launches Gemini Notebooks to keep chats, files and NotebookLM in sync

Canva buys Simtheory and Ortto to supercharge AI marketing stack

Creators can now add PayPal, Venmo, and Pay Later inside Canva designs

Anthropic and PayPal talk scaling Claude Cowork

Claude Cowork rolls out to all paid plans with enterprise superpowers

OpenAI launches mid-tier $100 ChatGPT Pro plan with higher Codex limits

Evernote adds inline YouTube playback in notes