There’s a certain rhythm to how OpenAI drops its biggest updates — usually a blog post, a splash of benchmarks, a few partner quotes, and then the slow real-world realization that something genuinely different has landed. GPT‑5.4, released on March 5, 2026, follows that playbook, but this time the gap between the marketing and the actual capability feels unusually narrow. This is a meaningful release, and if you’ve been paying attention to how AI tools are slowly creeping into professional workflows, GPT‑5.4 is arguably the most coherent step in that direction OpenAI has taken yet.
The headline is straightforward: OpenAI is releasing GPT‑5.4 in ChatGPT — where it shows up as GPT‑5.4 Thinking — as well as in the API and in Codex, the company’s AI-powered coding environment. Alongside it comes GPT‑5.4 Pro, a higher-capability tier designed for people who need maximum performance on genuinely complex tasks. In other words, there’s now a tiered offering inside the same model generation, with different performance ceilings depending on what you’re willing to pay and what you actually need to do.
What makes GPT‑5.4 interesting is that it’s not a narrow specialist. OpenAI built it by fusing together the strengths of its recent coding-focused model, GPT-5.3-Codex, with improvements in reasoning, agentic workflows, and professional knowledge work. The result is a model that OpenAI describes as its most capable and efficient frontier model for professional work — a claim that lands differently when you see what “professional work” actually means in this context. We’re talking about spreadsheets, presentations, legal documents, financial models, and real multi-step workflows that previously would have required significant human intervention to course-correct.
One of the most immediately useful changes in ChatGPT is how GPT‑5.4 Thinking now handles its reasoning process. Instead of quietly thinking through a problem and presenting you with a finished output, the model now provides an upfront plan of its thinking before it starts working. That might sound like a small UX tweak, but it’s actually quite significant in practice. You can intercept the model mid-response, redirect it, adjust its assumptions, and arrive at an output that’s closer to what you actually needed — without burning extra turns going back and forth. For anyone who’s spent time nudging AI outputs through successive prompts, this is a real quality-of-life improvement.
Deep research within ChatGPT has also been improved, particularly for highly specific queries that require pulling together information from many sources. On BrowseComp, a benchmark that measures how well AI agents can persistently browse the web to find hard-to-locate information, GPT‑5.4 jumps 17 percentage points over GPT‑5.2, scoring 82.7%. GPT‑5.4 Pro takes it even further, hitting 89.3% — a new state of the art at the time of release. If your work involves research that goes beyond the first page of search results, this matters.
The knowledge work story is where things get especially compelling. OpenAI tested GPT‑5.4 on GDPval, a benchmark that evaluates agents on real-world tasks spanning 44 occupations across the top nine industries contributing to U.S. GDP. The tasks aren’t abstract puzzles — they include things like sales presentations, accounting spreadsheets, manufacturing diagrams, and urgent care schedules. GPT‑5.4 matches or exceeds industry professionals in 83.0% of comparisons, compared to 70.9% for its predecessor GPT‑5.2. That’s a significant jump, and it speaks to how quickly the gap between AI-assisted work and traditional human professional output is narrowing.
The spreadsheet performance is particularly worth dwelling on. OpenAI ran GPT‑5.4 against an internal benchmark of spreadsheet modeling tasks that a junior investment banking analyst might encounter, and the model scored 87.3%, compared to 68.4% for GPT‑5.2. That’s an almost 20-point improvement, and in a domain where accuracy is non-negotiable. On presentations, human raters preferred GPT‑5.4’s outputs 68% of the time over GPT‑5.2’s, citing stronger aesthetics, greater visual variety, and better use of image generation. For those who use ChatGPT for document and slide creation at work, the improvement should be noticeable pretty quickly. And for Enterprise users specifically, OpenAI is also launching a ChatGPT for Excel add-in on the same day, which extends the spreadsheet capabilities directly into Microsoft’s environment.
Factual accuracy has also taken a meaningful step forward. GPT‑5.4’s individual claims are 33% less likely to be false and full responses are 18% less likely to contain any errors relative to GPT‑5.2, based on a set of de-identified prompts where users previously flagged factual errors. Hallucination has always been the sore point in enterprise AI adoption, and OpenAI’s decision to call this out explicitly suggests they know it’s still the key credibility test with professional users.
The computer use capabilities are, arguably, the most forward-looking part of this release. GPT‑5.4 is OpenAI’s first general-purpose model with native computer-use capabilities, meaning it can actually control a computer — interpreting screenshots, clicking on UI elements, and executing keyboard and mouse commands — to complete real tasks across websites and software applications. On OSWorld-Verified, which measures a model’s ability to navigate a desktop environment through screenshots and keyboard and mouse input, GPT‑5.4 achieves a 75.0% success rate, surpassing human performance at 72.4% and massively outpacing GPT‑5.2’s 47.3%. Crossing the human performance line on a computer-use benchmark isn’t a trivial thing.
Mainstay, a company that uses AI to navigate property tax and HOA portals, shared that GPT‑5.4 achieved a 95% success rate on the first attempt and 100% within three attempts across roughly 30,000 portals — completing sessions around three times faster while using 70% fewer tokens than previous computer-use AI models. That kind of real-world performance data is more grounded than any benchmark table.
For developers, GPT‑5.4 introduces tool search in the API — a feature that changes how models work with external tools and MCP servers in a significant way. Previously, all tool definitions had to be loaded into the model’s prompt upfront, which could mean tens of thousands of tokens being consumed every request, even for tools the model would never actually use. With tool search, GPT‑5.4 instead gets a lightweight list of available tools and can look up specific tool definitions only when it needs them. In testing, this reduced total token usage by 47% while maintaining the same accuracy. For developers building complex agent workflows with large tool ecosystems, this is the kind of efficiency gain that directly impacts cost and response speed.
The coding side of the release sees GPT‑5.4 stepping in as OpenAI’s unified model that brings together top-tier coding with everything else. It matches or outperforms GPT-5.3-Codex on SWE-Bench Pro — a benchmark for real-world software engineering tasks — while running at lower latency. In Codex specifically, users can toggle on a fast mode that delivers up to 1.5x faster token velocity without changing the underlying model or its intelligence level. Developers building through the API can access the same speed boost via priority processing.
One technically notable detail is context window support. GPT‑5.4 in Codex includes experimental support for a one million token context window, which allows agents to plan, execute, and verify tasks across extremely long horizons — something that becomes essential when you’re working with large codebases or multi-document workflows. Standard ChatGPT context windows for GPT‑5.4 Thinking remain the same as they were for GPT‑5.2 Thinking.
On safety, OpenAI is treating GPT‑5.4 as a High cyber capability model under its Preparedness Framework, the same classification it gave GPT-5.3-Codex. The company has expanded its cyber safety stack, including monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests on Zero Data Retention surfaces. OpenAI also released new research on Chain-of-Thought monitorability, including an open-source evaluation measuring whether models can deliberately hide their reasoning to evade safety monitoring. The finding that GPT‑5.4 Thinking’s ability to control its chain of thought is low is being framed as a positive safety property — essentially, the model can’t effectively hide its reasoning, which means monitoring remains a viable safety tool.
As for availability, GPT‑5.4 Thinking is rolling out to ChatGPT Plus, Team, and Pro users, replacing GPT‑5.2 Thinking as the default. GPT‑5.2 Thinking will remain accessible in the model picker under Legacy Models for three months before being retired on June 5, 2026. Enterprise and Edu users can enable early access through admin settings, and GPT‑5.4 Pro is available to Pro and Enterprise plan holders.
Pricing has moved up slightly from GPT‑5.2. In the API, GPT‑5.4 is priced at $2.50 per million input tokens and $15 per million output tokens, compared to GPT‑5.2’s $1.75 and $14, respectively. GPT‑5.4 Pro goes for $30 per million input tokens and $180 per million output tokens. OpenAI’s argument is that greater token efficiency means you’ll use fewer tokens to get the same results, softening the effective price increase for many use cases. Batch and Flex pricing are available at half the standard rate, while Priority processing doubles it.
The broader context here is worth keeping in mind. GPT‑5.4 lands against a backdrop of intensifying competition — Google has been steadily improving Gemini across both consumer and API tiers, Anthropic‘s Claude models have made real inroads with enterprise users, and a wave of open-source models continues to close the gap on capabilities that once required frontier-level compute. OpenAI’s response is a model that’s designed to be genuinely useful for work that actually matters — not just impressive in benchmarks, but meaningfully better at the kinds of tasks that professionals encounter day to day. Whether that’s enough to maintain its position at the top of the market is a question that will play out over the coming months, but GPT‑5.4 is, at minimum, a serious and coherent answer to the question of what AI models should be doing next.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
