GadgetBond

  • Latest
  • How-to
  • Tech
    • AI
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Add GadgetBond as a preferred source to see more of our stories on Google.
Font ResizerAa
GadgetBondGadgetBond
  • Latest
  • Tech
  • AI
  • Deals
  • How-to
  • Apps
  • Mobile
  • Gaming
  • Streaming
  • Transportation
Search
  • Latest
  • Deals
  • How-to
  • Tech
    • Amazon
    • Apple
    • CES
    • Computing
    • Creators
    • Google
    • Meta
    • Microsoft
    • Mobile
    • Samsung
    • Security
    • Xbox
  • AI
    • Anthropic
    • ChatGPT
    • ChatGPT Atlas
    • Gemini AI (formerly Bard)
    • Google DeepMind
    • Grok AI
    • Meta AI
    • Microsoft Copilot
    • OpenAI
    • Perplexity
    • xAI
  • Transportation
    • Audi
    • BMW
    • Cadillac
    • E-Bike
    • Ferrari
    • Ford
    • Honda Prelude
    • Lamborghini
    • McLaren W1
    • Mercedes
    • Porsche
    • Rivian
    • Tesla
  • Culture
    • Apple TV
    • Disney
    • Gaming
    • Hulu
    • Marvel
    • HBO Max
    • Netflix
    • Paramount
    • SHOWTIME
    • Star Wars
    • Streaming
Follow US
AIOpenAITech

OpenAI’s GPT-5.4 can click, type, and work your PC for you

OpenAI's GPT-5.4 is the first general-purpose AI model that can control a computer — clicking, typing, and navigating apps entirely on its own.

By
Shubham Sawarkar
Shubham Sawarkar's avatar
ByShubham Sawarkar
Editor-in-Chief
I’m a tech enthusiast who loves exploring gadgets, trends, and innovations. With certifications in CISCO Routing & Switching and Windows Server Administration, I bring a sharp...
Follow:
- Editor-in-Chief
Mar 6, 2026, 6:55 AM EST
Share
We may get a commission from retail offers. Learn more
ChatGPT logo and wordmark in white on a soft blue and orange gradient background, representing OpenAI’s ChatGPT platform.
Image: OpenAI
SHARE

There’s a version of the future that technologists have been drawing on whiteboards for years — one where you don’t open apps, you just tell an AI what you need done, and it goes and does it. You wake up, ask it to pull together the latest sales numbers from three different spreadsheets, draft a client summary, book the meeting, and send the follow-up email. You don’t click anything. You don’t navigate anything. You just get the result. OpenAI, on March 5, 2026, just moved that idea a significant step closer to reality.

The company released GPT-5.4, and the headline feature — at least for developers and enterprises — is something called native computer-use capability. This is the first time OpenAI has built direct computer control into a general-purpose model, not as a bolted-on plugin or an experimental side project, but as a core, baked-in capability. The model can look at a screenshot of your screen, understand what it sees, and then take action — moving a mouse cursor, clicking buttons, typing into fields — all on its own. It’s the kind of thing that sounds almost mundane when described that way, but the implications of it, when you sit with them for a moment, are genuinely staggering.

To be clear about what this actually means in practice: GPT-5.4 can operate computers by interpreting visual information the way a person would. It takes a screenshot, reasons about what’s on the screen, decides what to do next, and executes that action. It can send emails, schedule calendar events, fill out forms, navigate websites, open applications, and run through long multi-step workflows that would typically require a human to stay in the loop at each stage. OpenAI demonstrated this with a video showing GPT-5.4 interpreting a browser interface and interacting with UI elements through coordinate-based clicking to handle both email and calendar tasks in real time, without any human guidance.

What separates this from the various “computer use” experiments we’ve seen before — from Anthropic’s Computer Use demo with Claude, or early OpenAI Operator experiments — is the breadth and the depth of what GPT-5.4 can do, and how well it actually performs at doing it. The benchmark that matters most here is OSWorld-Verified, which is an industry-standard test that throws an AI agent into a real desktop environment with real apps and asks it to complete tasks using only screenshots and mouse-and-keyboard inputs. It’s essentially a measure of how well AI can do the kind of computer work a human does every day. GPT-5.4 scored a 75.0% success rate on that test. For reference, GPT-5.2 — the previous generation model — scored 47.3% on the same test. Human performance on the benchmark sits at 72.4%. GPT-5.4, in other words, has officially crossed the human threshold on this particular measure of computer-use ability.

That number needs some context, because it’s easy to throw benchmarks at people and have them bounce off. OSWorld is not a toy test. It involves 369 real-world computer tasks across web browsers, desktop applications, file management, and multi-app workflows. It was designed specifically to test whether AI agents can handle the messy, unpredictable reality of actual computer environments — not clean simulations. Crossing the human threshold on that benchmark is a genuine milestone, and one that the research community has been watching closely for well over a year.

The practical real-world validation is even more compelling than the benchmark. Dod Fraser, CEO at Mainstay, a company that uses AI to process property tax and HOA portals at scale, described testing GPT-5.4 across roughly 30,000 different portals. The model achieved a 95% success rate on the first attempt and a 100% success rate within three attempts. Previous computer-use models were completing those same tasks at a 73–79% rate. It also ran those sessions roughly three times faster while using about 70% fewer tokens. For a company operating at that scale, those numbers translate directly into meaningful cost savings and reliability improvements.​

It’s worth pausing on why this is technically hard to get right. When a person uses a computer, they bring a lifetime of learned context to every interaction. They know that a greyed-out button can’t be clicked. They know that when a loading spinner appears, they should wait. They understand that a warning dialog needs to be dismissed before anything else can happen. They can read a dense interface and immediately understand which elements are interactive and which are decorative. Getting an AI model to do all of that — reliably, across thousands of different websites and applications that were never designed with AI interaction in mind — is an enormous challenge. The fact that GPT-5.4 now does this better than the average human test taker, using nothing but screenshots and cursor commands, says something meaningful about how far the underlying vision and reasoning capabilities have come.

A key part of what enables this is the model’s improved visual perception. OpenAI upgraded GPT-5.4’s ability to process high-resolution images with finer fidelity than its predecessors. Starting with this model, there’s now an “original” image input detail level that supports full-fidelity perception up to 10.24 million total pixels. In early testing with API users, OpenAI found strong gains in click accuracy and localization — meaning the model was better at identifying exactly where on a screen to click, even in dense, complicated interfaces. On MMMU-Pro, a benchmark measuring visual understanding and reasoning, GPT-5.4 scored 81.2% without any tool use. It also improved significantly on document parsing benchmarks, which is relevant for the computer-use case because so much of real-world computer work involves reading and acting on documents.​

On browser-based tasks specifically, the story is similarly strong. GPT-5.4 scored 67.3% on WebArena-Verified, which tests an AI agent’s ability to complete real-world browser tasks, compared to GPT-5.2’s 65.4%. On Online-Mind2Web, another browser-use benchmark, it achieved a 92.8% success rate using only screenshot-based observations — no DOM access, no special browser hooks, just looking at the screen like a person would. That 92.8% figure is a significant jump over the 70.9% achieved by ChatGPT Atlas‘s Agent Mode, which previously held the leading position on that benchmark.​

The model’s computer-use behavior is also designed to be steerable in ways that earlier systems weren’t. Developers building agents on top of GPT-5.4 can configure how the model behaves — adjusting it to suit different risk tolerances, specifying when it should pause and ask for confirmation before taking an action, and setting custom confirmation policies for more sensitive workflows. This matters a lot for enterprise deployments, where you might want an agent to breeze through data entry tasks autonomously but pause before sending any outbound communication or making any purchase. That kind of fine-grained control over agent behavior has been one of the missing pieces in making computer-use AI actually deployable in serious business environments.

Beyond just clicking around a screen, GPT-5.4 also introduced a new capability called tool search, which quietly solves a problem that’s been quietly undermining the practicality of AI agents in large systems. When an AI agent is connected to many tools — say, dozens of APIs, MCP servers, database connectors, and workflow integrations — all of those tool definitions have historically been loaded into the model’s context upfront. That could mean tens of thousands of tokens just sitting there in every single request, even for tasks that only need one or two of those tools. Tool search flips this around: the model gets a lightweight directory of what’s available and looks up specific tool definitions only when it actually needs them. OpenAI tested this on 250 tasks from Scale’s MCP Atlas benchmark with all 36 MCP servers enabled and found that the tool search approach reduced total token usage by 47% while maintaining the same accuracy. For companies running agents at any significant scale, that efficiency gain compounds into real money very quickly.​

The model also brings a major leap in the Toolathlon benchmark — a test designed to measure how well AI agents work with real-world tools and APIs to complete multi-step tasks, things like reading emails, extracting attachments, uploading files, grading submissions, and recording results in a spreadsheet. GPT-5.4 hits 54.6% on that benchmark at xhigh reasoning effort, compared to 46.3% for GPT-5.2, and it does it in fewer tool yields — meaning fewer round-trips where the model has to pause and wait for external responses. Wade, the CEO at Zapier, described GPT-5.4 as the “most persistent model to date,” finishing jobs where previous models gave up.​

All of this computer-use capability sits alongside genuine improvements in coding and knowledge work that make GPT-5.4 a more holistic model than anything OpenAI has released before. The coding capabilities are absorbed from GPT-5.3-Codex, and the combination of those strengths with computer-use is where things get particularly interesting for developers. OpenAI released an experimental Codex skill called “Playwright (Interactive)” alongside GPT-5.4, which lets the model visually debug web and desktop applications — and even test apps it’s currently building, while it’s building them, using visual feedback from the browser. The demo OpenAI showed for this was a fully playable isometric theme park simulation game, generated from a single lightly specified prompt, complete with pathfinding guests, ride queues, dynamic park metrics, and polished visual assets — all built and iteratively playtested by the AI itself.

For knowledge work, GPT-5.4 scored 83% on GDPval — a benchmark that tests agents across 44 occupations spanning the top nine industries contributing to U.S. GDP — matching or exceeding the performance of industry professionals in that proportion of comparisons. The model also scored 87.3% on an internal benchmark of investment banking spreadsheet modeling tasks, up from 68.4% for GPT-5.2. Niko Grupen, Head of Applied Research at Harvey, the legal AI platform, noted that GPT-5.4 scored 91% on their BigLaw Bench evaluation and called it particularly strong at maintaining accuracy across lengthy contracts.​

On the factual accuracy front, OpenAI claims GPT-5.4’s individual claims are 33% less likely to be false and full responses are 18% less likely to contain any errors compared to GPT-5.2, based on a set of de-identified user-flagged prompts. For computer-use applications, that reduction in hallucination is arguably more important than it is for a chat interface — because when an AI agent is executing actions on a computer, a wrong assumption doesn’t just generate a slightly off answer, it can trigger an irreversible action.​

Availability-wise, GPT-5.4 is rolling out in ChatGPT as GPT-5.4 Thinking for Plus, Team, and Pro users, replacing GPT-5.2 Thinking as the default. It’s also available in the API right now as gpt-5.4, and the Pro variant is available for those on Pro and Enterprise plans. Pricing in the API comes in at $2.50 per million input tokens and $15 per million output tokens — slightly higher than GPT-5.2’s $1.75 and $14 respectively, though OpenAI notes the model’s improved token efficiency means total token consumption should be lower for many tasks. The model also supports up to a 1 million token context window in Codex, which makes it capable of reasoning across entire codebases or long chains of prior actions without losing the thread.

OpenAI is treating GPT-5.4 as a “High cyber capability” model under its Preparedness Framework, meaning it comes with an expanded safety stack that includes monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests. The company also released a new open-source evaluation for chain-of-thought controllability — essentially a test of whether the model can deliberately hide its reasoning to evade safety monitoring — and reported that GPT-5.4’s ability to do that is low, which they describe as a positive safety indicator.​

The broader picture here is one of consolidation. For much of the past year, OpenAI’s model lineup had become a bit fractured — you had reasoning models for one thing, coding models for another, and general-purpose chat models somewhere in the middle. GPT-5.4 is an attempt to pull all of that together into a single model that can handle the full stack of professional work, including — and this is the part that really marks a turning point — the physical act of operating a computer to get that work done. Whether that vision fully delivers in real-world deployments, at scale, across the wildly varied software environments that enterprises actually use, is something that will take months to assess properly. But on the numbers and the demonstrations available right now, GPT-5.4 represents the clearest signal yet that AI agents that can genuinely operate computers are no longer a future thing. They’re here, and they work.


Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

Topic:ChatGPTChatGPT Atlas
Leave a Comment

Leave a ReplyCancel reply

Most Popular

Perplexity Computer now works natively in Microsoft’s core productivity apps

OpenAI expands GPT-Rosalind access with new Rosalind Biodefense program

Codex computer use comes to Windows, with mobile in the loop

Anthropic raises $65 billion, nears trillion-dollar status

Claude Opus 4.8 now powers Perplexity Max and Computer

Also Read
Grocery, gardening, and household items from a Walmart delivery are arranged on a front doorstep outside a brick home. A blue Walmart shopping bag, a bag of Miracle-Gro potting mix, bread, and potted flowers sit on a welcome mat, surrounded by decorative planters and colorful blooming plants near a wooden front door.

Walmart’s 30-minute delivery is now live in 33 U.S. cities

Stylized rendering of a Qualcomm Snapdragon C processor mounted at the center of a translucent microchip, surrounded by circuit pathways on a light gray background. The black Snapdragon C logo stands out against the monochrome chip design, symbolizing computing performance, connectivity, and modern processor technology.

Qualcomm’s new Snapdragon C is the budget laptop chip nobody knew they were waiting for

Acer Aspire Go 15 (AG15-Q31P) powered by Qualcomm Snapdragon C chip

Acer Aspire Go 15 is the first laptop ever built on Qualcomm’s new Snapdragon C chip

Acer Swift Spin 14 AI (SFSP14-Q51T) laptop

Acer’s Swift Spin 14 AI is the convertible laptop that finally gets Snapdragon right

Split-panel graphic featuring a torn sheet of grid paper with black hand-drawn scribbles on a light blue background on the left, and a minimalist illustration of an open hand holding a connected node network symbol on a terracotta-orange background on the right, representing creativity, ideas, and collaborative intelligence.

Claude Opus 4.8 launches with sharper judgment and new controls

Minimal hand-drawn illustration of a hanging presentation screen displaying a coding symbol (“”), suspended above a stylized script-like “pm” mark on a solid terracotta-orange background, representing programming, development workflows, or coding education.

Claude Code now orchestrates its own dynamic workflows

Minimal flat illustration of code review: an orange background with two large black curly braces framing the center, where a white octagonal icon containing a simple code symbol “” is examined by a black magnifying glass.

Anthropic’s security-guidance plugin makes Claude Code less reckless

Perplexity illustration. The image depicts a dark, abstract interior space with vertical columns and beams of light streaming through, creating a play of shadows and light. In the center, there is a white geometric Perplexity logo resembling a stylized star or snowflake. The light beams display a spectrum of colors, adding a surreal and intriguing atmosphere to the scene.

Perplexity open-sources its blazing-fast Unigram tokenizer

Company Info
  • Homepage
  • Support my work
  • Latest stories
  • Company updates
  • GDB Recommends
  • Daily newsletters
  • About us
  • Contact us
  • Write for us
  • Editorial guidelines
Legal
  • Privacy Policy
  • Cookies Policy
  • Terms & Conditions
  • DMCA
  • Disclaimer
  • Accessibility Policy
  • Security Policy
  • Do Not Sell or Share My Personal Information
Socials
Follow US

Disclosure: We love the products we feature and hope you’ll love them too. If you purchase through a link on our site, we may receive compensation at no additional cost to you. Read our ethics statement. Please note that pricing and availability are subject to change.

Copyright © 2026 GadgetBond. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | Do Not Sell/Share My Personal Information.