OpenAI shook up the AI landscape this week with the unveiling of ChatGPT Agent, a powerful new feature that lets the company’s flagship assistant autonomously carry out complex, multi‑step tasks in its own virtual environment. Announced on July 17, 2025, and rolling out immediately to Pro subscribers (with Plus and Team users joining over the coming days), ChatGPT Agent merges the best of OpenAI’s earlier Operator and Deep Research tools to create what the industry calls an “agentic AI” — one that doesn’t just chat, but acts.
Under the hood, ChatGPT Agent spins up its own sandboxed operating system and web browser — a kind of virtual “robot body” that lives on OpenAI’s servers. When you ask it to, say, assemble and purchase an outfit for a summer wedding, it can:
- Browse online stores
- Filter by style, size, price and return policy
- Add selected items to your cart
- Even complete the checkout (with your permission)
All while you sit back and watch the action unfold in real time inside the ChatGPT interface.
This isn’t just a toy. OpenAI envisions uses ranging from generating fully formatted PowerPoint decks to updating Excel spreadsheets with fresh data, planning weekly grocery runs, or booking flights and hotels. And yes, Agent can run custom scripts via a built‑in terminal, scrape websites, and invoke “ChatGPT Connectors” to tap into Gmail, GitHub, and other third‑party services.
Despite its autonomy, ChatGPT Agent is designed with user control front and center. Before taking any action that could have real‑world consequences — like charging your credit card — Agent pauses to ask for confirmation. You can interrupt or halt operations at any time, grab direct control of the virtual browser, or snoop on every keystroke and click through a live “Watch Mode.” Even high‑risk domains like financial sites and social media are flagged, and the agent is trained to refuse tasks it deems too dangerous, such as bank transfers or deep system modifications.
As Operator did before it, Agent will still require explicit permission before proceeding with anything irreversible. And once you’re done, you can delete all browsing logs or log out of connected accounts with a single click — OpenAI promises that any data entered during “takeover mode” isn’t stored.
OpenAI’s marketing materials boast state‑of‑the‑art scores on a suite of internal benchmarks, but they come with the usual caveats. On Humanity’s Last Exam — a test of expert‑level reasoning — ChatGPT Agent hit 41.6 percent accuracy, compared to 24.9 percent for the previous o3 model using tools. On the fiendish math benchmark FrontierMath, it scored 27.4 percent with tool access (o3+Python: 19.3 percent).
Remarkably, Agent reportedly outperformed humans on data‑science tasks: 89.9 percent on DSBench’s analysis questions (versus 64.1 percent for people) and 85.5 percent on modeling tasks (versus 65.0 percent). For web information retrieval (BrowseComp), it managed 68.9 percent accuracy, and on spreadsheet editing (SpreadsheetBench), 45.5 percent — again besting earlier OpenAI models.
Yet benchmarks only tell part of the story. In a recent “Cyber Range” demo, the agent was asked to perform a simulated pen‑test against a fake e‑commerce network. It scouted servers and ran initial probes but faltered when novel exploits were required. Even with hints, it couldn’t chain together the final maneuvers — perhaps a blessing, given the scenario’s hacking angle.
With great power comes great attack surface. Because Agent can execute browser actions and shell commands, it’s susceptible to the very prompt‑injection attacks that plague LLMs elsewhere. Imagine a malicious site embedding hidden instructions to exfiltrate your credit‑card data via a hidden form — Agent might dutifully follow them unless it recognizes the risk.
To guard against such threats, OpenAI has layered several defenses:
- Steering‑resistant training, teaching models to spot and ignore suspicious prompts
- Action‑approval gates, prompting users before sensitive steps
- Model overseers, lightweight AIs monitoring other models in real time and halting errant behavior.
Academic researchers, however, warn that these safeguards aren’t bulletproof. A June 2025 study showed that web‑use agents can be tricked into camera activation, password exfiltration, or denial‑of‑service attacks by subtly crafted site content.
For now, ChatGPT Pro users get 400 Agent‑powered messages per month, while Plus and Team subscribers will receive 40. OpenAI promises regular feature updates: richer integrations, more app connectors, and smarter self‑diagnosis when things go off‑script.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
