GPT-5.5 is here and it's smarter, faster, and cheaper to run

On April 23, 2026, OpenAI quietly did what it has been doing at a relentless pace over the past year — it dropped a new model. But GPT-5.5 isn’t the kind of release you skim past. This one feels different, and if the early feedback from engineers, researchers, and enterprise teams is anything to go by, it might be the model that starts changing how people actually think about getting work done on a computer.

OpenAI is calling it “a new class of intelligence for real work,” which sounds like the kind of tagline marketing teams dream up — except in this case, the benchmarks and the real-world testimonials seem to back it up. GPT-5.5 is described as the company’s smartest and most intuitive model yet, designed not just to answer questions but to actually do things: write and debug code, research online, analyze data, build documents and spreadsheets, operate software, and keep working across tools until a task is genuinely finished.

That last part is what sets GPT-5.5 apart in the conversation. Previous models, even the very capable ones, often needed babysitting. You’d hand off a multi-step task and then monitor every move, nudge it back on track, and double-check outputs. GPT-5.5, according to OpenAI, is built to handle ambiguity. You give it a messy, multi-part job, and it plans, uses tools, checks its own work, and pushes through the fuzzy bits without needing your hand on its shoulder the whole time.

The gains are most pronounced in agentic coding, computer use, knowledge work, and early scientific research — areas where the model needs to reason across long stretches of context and take real action over time. Importantly, OpenAI says GPT-5.5 delivers this intelligence upgrade without sacrificing speed. That’s a genuinely tricky engineering challenge. Larger, more capable models are usually slower to serve, but GPT-5.5 reportedly matches GPT-5.4‘s per-token latency in real-world serving, while operating at a meaningfully higher intelligence level. It also uses significantly fewer tokens to complete the same tasks in Codex, which translates to cost savings that matter at scale.

For developers wondering about pricing, OpenAI’s API page already shows GPT-5.5 coming in at $5 per million input tokens and $30 per million output tokens, which puts it in the premium tier but still competitive given the capability jump — especially when you consider that on Artificial Analysis’s Coding Index, GPT-5.5 delivers state-of-the-art intelligence at roughly half the cost of competitive frontier coding models.

The coding story is probably the most compelling right now. On Terminal-Bench 2.0 — a benchmark that tests complex command-line workflows requiring real planning, tool coordination, and iteration — GPT-5.5 hit 82.7% accuracy, beating GPT-5.4’s 75.1% and outpacing Claude Opus 4.7 and Gemini 3.1 Pro. On SWE-Bench Pro, which evaluates real-world GitHub issue resolution, it reached 58.6%, solving more tasks end-to-end in a single pass than any previous OpenAI model. And on Expert-SWE, an internal eval where tasks have a median estimated human completion time of 20 hours, GPT-5.5 again outperformed its predecessor.

These aren’t just numbers. The engineers who actually used the model before launch had a lot to say. Dan Shipper, founder and CEO of Every, called it “the first coding model I’ve used that has serious conceptual clarity.” He ran a real test: he gave GPT-5.5 access to a broken codebase that had stumped him for days, the kind of thing that eventually required one of his best engineers to do a substantial rewrite. GPT-5.4 couldn’t replicate what the engineer had done. GPT-5.5 could.

Pietro Schirano, CEO of MagicPath, had a similar story. He watched GPT-5.5 merge a branch with hundreds of frontend and refactor changes into a main branch that had also diverged significantly — and it resolved the whole thing in one shot, in about 20 minutes. Senior engineers testing the model reported that it was noticeably better than GPT-5.4 and even Claude Opus 4.7 at catching issues before they became problems, reasoning about ambiguous failures, and predicting what testing and code review would need — without being explicitly told to do any of that.

One engineer at NVIDIA put it more bluntly: “Losing access to GPT-5.5 feels like I’ve had a limb amputated.” That’s the kind of quote that gets passed around in developer Slack channels, and it’s probably going to stick.

Michael Truell, co-founder and CEO of Cursor — one of the most prominent AI coding tools on the market — described GPT-5.5 as “noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use,” adding that it “stays on task for significantly longer without stopping early,” which matters most for the complex, long-running work Cursor’s users delegate to the model.

Beyond coding, the knowledge work story is just as interesting. OpenAI says GPT-5.5 is better at understanding intent and moving naturally through the full loop of knowledge work — finding information, understanding what matters, using tools, checking outputs, and turning raw inputs into something actually useful. Internally, more than 85% of OpenAI employees are already using Codex with GPT-5.5 every week, across software engineering, finance, communications, marketing, data science, and product management.

The examples OpenAI shares from its own teams are striking. The finance team used Codex to review 24,771 K-1 tax forms — totaling 71,637 pages — in a workflow that helped them accelerate the task by two weeks compared to the prior year. The communications team built an automated Slack agent that scores and routes speaking requests based on risk level. A go-to-market employee automated weekly business report generation, saving between 5 and 10 hours every week. These are real internal productivity gains, not hypothetical scenarios, and they suggest the model is genuinely useful across functions that have nothing to do with software engineering.

On benchmarks that measure this kind of work, GPT-5.5 scores 84.9% on GDPval — an eval that tests agents across 44 different occupations producing real knowledge work output — putting it ahead of GPT-5.4, Claude Opus 4.7, and Gemini 3.1 Pro. On OSWorld-Verified, which measures whether a model can operate real computer environments autonomously, it hits 78.7%. On Tau2-bench Telecom, testing complex customer service workflows, it reaches 98.0% without any prompt tuning.

Then there’s the science angle, which might be the most surprising part of this announcement. GPT-5.5 isn’t just better at writing code and summarizing documents — it’s showing real capability in research workflows that require the kind of persistent, multi-stage thinking that researchers do across days or weeks, not just minutes. On GeneBench, a new eval focused on multi-stage scientific data analysis in genetics and quantitative biology, GPT-5.5 shows a clear improvement over GPT-5.4. These tasks often correspond to multi-day projects for scientific experts.

An immunology professor at the Jackson Laboratory for Genomic Medicine used GPT-5.5 Pro to analyze a gene-expression dataset with 62 samples and nearly 28,000 genes, producing a detailed research report that surfaced key questions and insights — work he said would have taken his team months. That’s not a marginal improvement. That’s potentially months of a research team’s time compressed into a session.

Perhaps most surprisingly, an internal version of GPT-5.5 with a custom harness helped discover a new proof about Ramsey numbers — a core area in combinatorics. The result was later verified in Lean, the formal proof assistant, confirming that the math was sound. This is the kind of thing that tends to stop people in their tracks. We’ve gotten used to AI generating text, writing code, and summarizing documents. Producing a verified mathematical proof in a nontrivial research area is something else.

The safety side of this release is also worth taking seriously. OpenAI has rated GPT-5.5 as “High” under its Preparedness Framework for both cybersecurity and biological/chemical capabilities — a step up from GPT-5.4, though the company says it did not reach the “Critical” threshold for cybersecurity. To manage that, OpenAI is deploying stricter classifiers for potential cyber risk and expanding its “Trusted Access for Cyber” program, which gives verified defenders — including organizations responsible for critical infrastructure — expanded access to GPT-5.5’s advanced cybersecurity capabilities with fewer restrictions. The company is also working with government partners on protecting critical infrastructure.

On the infrastructure side, GPT-5.5 was co-designed with, trained using, and is served on NVIDIA GB200 and GB300 NVL72 systems. In a somewhat recursive development, Codex and GPT-5.5 itself were used to help build the infrastructure that serves the model. The team used Codex to analyze weeks of production traffic patterns and write custom heuristic algorithms to better partition and balance workloads — an effort that resulted in a more than 20% increase in token generation speeds. NVIDIA’s VP of Enterprise AI, Justin Boitano, described GPT-5.5 as enabling teams to “ship end-to-end features from natural language prompts, cut debug time from days to hours, and turn weeks of experimentation into overnight progress in complex codebases.“

As of April 23, GPT-5.5 is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro — the higher-compute version — is rolling out to Pro, Business, and Enterprise users. API access is coming soon, with OpenAI saying it’s working closely with partners and customers on the safety and security requirements for serving the model at scale. The API pricing for developers will be $5 per million input tokens and $30 per million output tokens when it lands.

What makes this moment interesting is the broader context. OpenAI has been releasing models at a pace that would have seemed implausible even two years ago. GPT-5.3 came in February, GPT-5.4 in March, and now GPT-5.5 barely six weeks later. Competitors are moving just as fast. Anthropic, Google, and xAI are all shipping at a similar cadence. The race is genuinely fast, and the gap between what AI could do a year ago and what it can do now is significant.

GPT-5.5 feels like a model that wants to be less of a tool you use and more of a collaborator you work alongside. Whether that framing holds up in the day-to-day reality of how people actually use it is something that will shake out over the coming weeks. But if the early reactions from engineers, researchers, and enterprise teams are any guide, OpenAI has shipped something that a lot of people are going to find very hard to put down.