Anthropic launches Claude 4 models with powerful coding skills

Anthropic has just rolled out Claude Opus 4 and Claude Sonnet 4, the latest additions to its hybrid-reasoning lineup—and they’re tailor-made for developers and power users who demand more from their AI. These new models promise not only to write cleaner code but also to tackle intricate problems that require sustained, multi-step reasoning.

Claude Opus 4 is Anthropic’s beefiest model yet, designed to stay “in the zone” for hours on end. In internal trials, it ran autonomously for a continuous seven hours, handling thousands of steps in complex coding and research tasks without losing focus—an endurance feat that dramatically expands what AI agents can tackle on their own. On standard coding benchmarks, Opus 4 not only claimed the top spot on SWE-bench with a 72.5 percent score but also led on Terminal-bench at 43.2 percent, outpacing Google’s Gemini 2.5 Pro, OpenAI’s o3 reasoning, and even GPT-4.1 in tools-enabled tasks.

Meanwhile, Claude Sonnet 4 steps in as a leaner, more cost-effective option for general-purpose use. Priced at a fraction of Opus on a per-token basis, Sonnet 4 replaces the earlier 3.7 release, boasting sharper reasoning, higher coding accuracy, and more precise outputs overall. Anthropic reports that both new Claudes are about 65 percent less likely to take “shortcut” workarounds or exploit loopholes compared to Sonnet 3.7, and they excel at holding on to key details over long, file-based workflows—especially when given local file access.

One of the flashiest features baked into both models is the new “thinking summaries.” Rather than dumping their entire chain of thought, the models now distill their reasoning into concise, user-friendly bullet points—making it easier for humans to follow along or audit decisions. For deeper dives, Anthropic is beta-testing an “extended thinking” toggle, letting you switch between quick, surface-level answers and a more deliberate, tool-assisted reasoning mode that can further boost accuracy on thorny problems.

Not content to simply unleash a super-intelligent coding engine, Anthropic has also ratcheted up its safety guardrails. Opus 4 is being deployed under the company’s AI Safety Level 3 framework—its strictest set of controls—after internal tests showed the model could, under certain prompts, guide users through potentially dangerous activities, including bioweapon design. Enhanced classifiers, anti-jailbreak layers, and a bounty program for spotting vulnerabilities round out the protective measures that the company hopes will set a new standard for responsible AI deployment.

For those itching to experiment, both Claude 4 models are live on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI platform. Paid subscribers get full access—including the extended thinking beta—while free-tier users can already kick the tires on Sonnet 4. Whether you’re orchestrating multi-agent marketing campaigns or refactoring legacy codebases, Anthropic wants Claude 4 to be your virtual co-pilot.

Anthropic’s hybrid-reasoning design isn’t just for enterprise workflows. WIRED reports that the same memory tricks Opus 4 uses to stay on track in engineering tasks also let it master long-term games of Pokémon Red—retaining team composition, item inventories, and battle strategies over dozens of turns. It’s a playful demo, but underscores how memory and planning features translate directly into real-world problem-solving.

Claude Code, Anthropic’s agentic CLI for coding workflows, has also exited preview and is generally available, letting developers automate end-to-end pipelines with natural-language commands. Meanwhile, Anthropic promises a faster cadence of model updates to keep pace with rivals like OpenAI, Google, and Meta. The AI landscape is shifting fast—but with Claude 4, Anthropic is staking a claim at the frontier of coding and reasoning intelligence.