By using this site, you agree to the Privacy Policy and Terms of Use.
Accept

GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AIOpenAITech

OpenAI’s GPT-5.4 can click, type, and work your PC for you

OpenAI's GPT-5.4 is the first general-purpose AI model that can control a computer — clicking, typing, and navigating apps entirely on its own.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Mar 6, 2026, 6:55 AM EST
Share
We may get a commission from retail offers. Learn more
ChatGPT logo and wordmark in white on a soft blue and orange gradient background, representing OpenAI’s ChatGPT platform.
Image: OpenAI
SHARE

There’s a version of the future that technologists have been drawing on whiteboards for years — one where you don’t open apps, you just tell an AI what you need done, and it goes and does it. You wake up, ask it to pull together the latest sales numbers from three different spreadsheets, draft a client summary, book the meeting, and send the follow-up email. You don’t click anything. You don’t navigate anything. You just get the result. OpenAI, on March 5, 2026, just moved that idea a significant step closer to reality.

The company released GPT-5.4, and the headline feature — at least for developers and enterprises — is something called native computer-use capability. This is the first time OpenAI has built direct computer control into a general-purpose model, not as a bolted-on plugin or an experimental side project, but as a core, baked-in capability. The model can look at a screenshot of your screen, understand what it sees, and then take action — moving a mouse cursor, clicking buttons, typing into fields — all on its own. It’s the kind of thing that sounds almost mundane when described that way, but the implications of it, when you sit with them for a moment, are genuinely staggering.

To be clear about what this actually means in practice: GPT-5.4 can operate computers by interpreting visual information the way a person would. It takes a screenshot, reasons about what’s on the screen, decides what to do next, and executes that action. It can send emails, schedule calendar events, fill out forms, navigate websites, open applications, and run through long multi-step workflows that would typically require a human to stay in the loop at each stage. OpenAI demonstrated this with a video showing GPT-5.4 interpreting a browser interface and interacting with UI elements through coordinate-based clicking to handle both email and calendar tasks in real time, without any human guidance.

What separates this from the various “computer use” experiments we’ve seen before — from Anthropic’s Computer Use demo with Claude, or early OpenAI Operator experiments — is the breadth and the depth of what GPT-5.4 can do, and how well it actually performs at doing it. The benchmark that matters most here is OSWorld-Verified, which is an industry-standard test that throws an AI agent into a real desktop environment with real apps and asks it to complete tasks using only screenshots and mouse-and-keyboard inputs. It’s essentially a measure of how well AI can do the kind of computer work a human does every day. GPT-5.4 scored a 75.0% success rate on that test. For reference, GPT-5.2 — the previous generation model — scored 47.3% on the same test. Human performance on the benchmark sits at 72.4%. GPT-5.4, in other words, has officially crossed the human threshold on this particular measure of computer-use ability.

That number needs some context, because it’s easy to throw benchmarks at people and have them bounce off. OSWorld is not a toy test. It involves 369 real-world computer tasks across web browsers, desktop applications, file management, and multi-app workflows. It was designed specifically to test whether AI agents can handle the messy, unpredictable reality of actual computer environments — not clean simulations. Crossing the human threshold on that benchmark is a genuine milestone, and one that the research community has been watching closely for well over a year.

The practical real-world validation is even more compelling than the benchmark. Dod Fraser, CEO at Mainstay, a company that uses AI to process property tax and HOA portals at scale, described testing GPT-5.4 across roughly 30,000 different portals. The model achieved a 95% success rate on the first attempt and a 100% success rate within three attempts. Previous computer-use models were completing those same tasks at a 73–79% rate. It also ran those sessions roughly three times faster while using about 70% fewer tokens. For a company operating at that scale, those numbers translate directly into meaningful cost savings and reliability improvements.​

It’s worth pausing on why this is technically hard to get right. When a person uses a computer, they bring a lifetime of learned context to every interaction. They know that a greyed-out button can’t be clicked. They know that when a loading spinner appears, they should wait. They understand that a warning dialog needs to be dismissed before anything else can happen. They can read a dense interface and immediately understand which elements are interactive and which are decorative. Getting an AI model to do all of that — reliably, across thousands of different websites and applications that were never designed with AI interaction in mind — is an enormous challenge. The fact that GPT-5.4 now does this better than the average human test taker, using nothing but screenshots and cursor commands, says something meaningful about how far the underlying vision and reasoning capabilities have come.

A key part of what enables this is the model’s improved visual perception. OpenAI upgraded GPT-5.4’s ability to process high-resolution images with finer fidelity than its predecessors. Starting with this model, there’s now an “original” image input detail level that supports full-fidelity perception up to 10.24 million total pixels. In early testing with API users, OpenAI found strong gains in click accuracy and localization — meaning the model was better at identifying exactly where on a screen to click, even in dense, complicated interfaces. On MMMU-Pro, a benchmark measuring visual understanding and reasoning, GPT-5.4 scored 81.2% without any tool use. It also improved significantly on document parsing benchmarks, which is relevant for the computer-use case because so much of real-world computer work involves reading and acting on documents.​

On browser-based tasks specifically, the story is similarly strong. GPT-5.4 scored 67.3% on WebArena-Verified, which tests an AI agent’s ability to complete real-world browser tasks, compared to GPT-5.2’s 65.4%. On Online-Mind2Web, another browser-use benchmark, it achieved a 92.8% success rate using only screenshot-based observations — no DOM access, no special browser hooks, just looking at the screen like a person would. That 92.8% figure is a significant jump over the 70.9% achieved by ChatGPT Atlas‘s Agent Mode, which previously held the leading position on that benchmark.​

The model’s computer-use behavior is also designed to be steerable in ways that earlier systems weren’t. Developers building agents on top of GPT-5.4 can configure how the model behaves — adjusting it to suit different risk tolerances, specifying when it should pause and ask for confirmation before taking an action, and setting custom confirmation policies for more sensitive workflows. This matters a lot for enterprise deployments, where you might want an agent to breeze through data entry tasks autonomously but pause before sending any outbound communication or making any purchase. That kind of fine-grained control over agent behavior has been one of the missing pieces in making computer-use AI actually deployable in serious business environments.

Beyond just clicking around a screen, GPT-5.4 also introduced a new capability called tool search, which quietly solves a problem that’s been quietly undermining the practicality of AI agents in large systems. When an AI agent is connected to many tools — say, dozens of APIs, MCP servers, database connectors, and workflow integrations — all of those tool definitions have historically been loaded into the model’s context upfront. That could mean tens of thousands of tokens just sitting there in every single request, even for tasks that only need one or two of those tools. Tool search flips this around: the model gets a lightweight directory of what’s available and looks up specific tool definitions only when it actually needs them. OpenAI tested this on 250 tasks from Scale’s MCP Atlas benchmark with all 36 MCP servers enabled and found that the tool search approach reduced total token usage by 47% while maintaining the same accuracy. For companies running agents at any significant scale, that efficiency gain compounds into real money very quickly.​

The model also brings a major leap in the Toolathlon benchmark — a test designed to measure how well AI agents work with real-world tools and APIs to complete multi-step tasks, things like reading emails, extracting attachments, uploading files, grading submissions, and recording results in a spreadsheet. GPT-5.4 hits 54.6% on that benchmark at xhigh reasoning effort, compared to 46.3% for GPT-5.2, and it does it in fewer tool yields — meaning fewer round-trips where the model has to pause and wait for external responses. Wade, the CEO at Zapier, described GPT-5.4 as the “most persistent model to date,” finishing jobs where previous models gave up.​

All of this computer-use capability sits alongside genuine improvements in coding and knowledge work that make GPT-5.4 a more holistic model than anything OpenAI has released before. The coding capabilities are absorbed from GPT-5.3-Codex, and the combination of those strengths with computer-use is where things get particularly interesting for developers. OpenAI released an experimental Codex skill called “Playwright (Interactive)” alongside GPT-5.4, which lets the model visually debug web and desktop applications — and even test apps it’s currently building, while it’s building them, using visual feedback from the browser. The demo OpenAI showed for this was a fully playable isometric theme park simulation game, generated from a single lightly specified prompt, complete with pathfinding guests, ride queues, dynamic park metrics, and polished visual assets — all built and iteratively playtested by the AI itself.

For knowledge work, GPT-5.4 scored 83% on GDPval — a benchmark that tests agents across 44 occupations spanning the top nine industries contributing to U.S. GDP — matching or exceeding the performance of industry professionals in that proportion of comparisons. The model also scored 87.3% on an internal benchmark of investment banking spreadsheet modeling tasks, up from 68.4% for GPT-5.2. Niko Grupen, Head of Applied Research at Harvey, the legal AI platform, noted that GPT-5.4 scored 91% on their BigLaw Bench evaluation and called it particularly strong at maintaining accuracy across lengthy contracts.​

On the factual accuracy front, OpenAI claims GPT-5.4’s individual claims are 33% less likely to be false and full responses are 18% less likely to contain any errors compared to GPT-5.2, based on a set of de-identified user-flagged prompts. For computer-use applications, that reduction in hallucination is arguably more important than it is for a chat interface — because when an AI agent is executing actions on a computer, a wrong assumption doesn’t just generate a slightly off answer, it can trigger an irreversible action.​

Availability-wise, GPT-5.4 is rolling out in ChatGPT as GPT-5.4 Thinking for Plus, Team, and Pro users, replacing GPT-5.2 Thinking as the default. It’s also available in the API right now as gpt-5.4, and the Pro variant is available for those on Pro and Enterprise plans. Pricing in the API comes in at $2.50 per million input tokens and $15 per million output tokens — slightly higher than GPT-5.2’s $1.75 and $14 respectively, though OpenAI notes the model’s improved token efficiency means total token consumption should be lower for many tasks. The model also supports up to a 1 million token context window in Codex, which makes it capable of reasoning across entire codebases or long chains of prior actions without losing the thread.

OpenAI is treating GPT-5.4 as a “High cyber capability” model under its Preparedness Framework, meaning it comes with an expanded safety stack that includes monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests. The company also released a new open-source evaluation for chain-of-thought controllability — essentially a test of whether the model can deliberately hide its reasoning to evade safety monitoring — and reported that GPT-5.4’s ability to do that is low, which they describe as a positive safety indicator.​

The broader picture here is one of consolidation. For much of the past year, OpenAI’s model lineup had become a bit fractured — you had reasoning models for one thing, coding models for another, and general-purpose chat models somewhere in the middle. GPT-5.4 is an attempt to pull all of that together into a single model that can handle the full stack of professional work, including — and this is the part that really marks a turning point — the physical act of operating a computer to get that work done. Whether that vision fully delivers in real-world deployments, at scale, across the wildly varied software environments that enterprises actually use, is something that will take months to assess properly. But on the numbers and the demonstrations available right now, GPT-5.4 represents the clearest signal yet that AI agents that can genuinely operate computers are no longer a future thing. They’re here, and they work.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Topic:ChatGPTChatGPT Atlas
Leave a Comment

Leave a ReplyCancel reply

Most Popular

The $19 Apple polishing cloth supports iPhone 17, Air, Pro, and 17e

Apple MacBook Neo: big power, surprising price, one clear target — Windows

Everything Nothing announced on March 5: Headphone (a), Phone (4a), and Phone (4a) Pro

BenQ’s new 5K Mac monitor costs $999 — here’s what you’re getting

MacBook Neo and external monitors: it’s complicated

Also Read
Close-up of a person holding the Google Pixel 10 Pro Fold in Moonstone gray with both hands, rear-facing triple camera array and Google "G" logo prominently visible, worn against a silver knit top and blue jacket with a poolside background.

Pixel Care+ makes owning a Pixel a lot less scary — here’s why

Woman with blonde curly hair sitting outside in a lush park, holding a blue Google Pixel 10 and smiling at the screen.

Pixel 10a, Pixel 10, Pixel 10 Pro: one winner for every buyer

Google Search AI Mode showing Canvas in action, with a split-screen view of a conversational AI chat on the left and an "EE Opportunity Tracker" scholarship and grant tracking dashboard on the right, displaying a total funding secured amount of $5,000, scholarship cards with deadlines, and status labels including "To Apply" and "Awarded."

Google’s Canvas AI Mode rolls out to everyone in the U.S.

Google NotebookLM app listing on the Apple App Store displayed on an iPhone screen, showing the app icon, tagline "Understand anything," a Get button with In-App Purchases noted, 1.9K ratings, age rating 4+, and a chart ranking of No. 36 in Productivity.

NotebookLM Cinematic Video Overviews are live — here’s what’s new

A Google Messages conversation on an Android phone showing a real-time location sharing card powered by Find Hub and Google Maps, displaying a live map view near San Francisco Botanical Garden with a blue location dot, labeled "Your location – Sharing until 10:30 AM," within a chat about meeting up for coffee.

Google Messages real-time location sharing is here — here’s how it works

Screenshot of the Perplexity Pro interface with the model picker dropdown open, displaying GPT-5.4 labeled as New with the Thinking toggle switched on, and other available models including Sonar, Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.6 (Max-only), and Kimi K2.5.

GPT-5.4 is now on Perplexity — here’s what Pro/Max users get

A Microsoft Excel spreadsheet titled "Consumer Full 3 Statement Model" displaying a Balance Sheet in millions of dollars with historical financial data across four years (2020A–2023A), showing line items including cash and equivalents, accounts receivable, inventory, PP&E, goodwill, total assets, accounts payable, current debt maturities, and total liabilities, alongside an open ChatGPT sidebar panel where a user has asked ChatGPT to build an EBITDA-to-free-cash-flow conversion bridge with charts placed on the Balance Sheet tab, and the AI is actively responding by planning the analysis, filling in financing cash rows, and executing multiple actions in real time.

ChatGPT for Excel is here — and it runs on GPT‑5.4

GPT-5.4 Thinking label displayed on a soft gradient background blending pink, orange, and lavender hues, with "GPT-5.4" in dark text and "Thinking" in muted gray-pink.

OpenAI’s GPT-5.4 and GPT-5.4 Pro are now live in ChatGPT

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.