OpenAI launches GPT-5.2 as its new flagship AI model series

OpenAI has unveiled GPT‑5.2, a new flagship AI model series aimed squarely at professional work and long-running, agent-based workflows, promising sharper reasoning, stronger coding, better long‑context handling, and a noticeable step up in reliability over GPT‑5.1.

What GPT‑5.2 is meant to do

OpenAI positions GPT‑5.2 as its most capable frontier model so far for knowledge work, spanning everything from complex spreadsheets and pitch decks to codebases, research documents, and multimodal analysis. The company’s pitch is economic: ChatGPT Enterprise users already report saving 40–60 minutes a day—and the heaviest users over 10 hours a week—and GPT‑5.2 is designed to push those gains further by handling more of the work end‑to‑end, not just drafting text.

Under the hood, GPT‑5.2 comes in three main flavors inside ChatGPT: Instant, Thinking, and Pro, with corresponding API SKUs that map to gpt-5.2-chat-latest, gpt-5.2, and gpt-5.2-pro respectively. Instant targets everyday queries and fast responses, Thinking is tuned for deeper, more structured work, and Pro is reserved for the hardest problems where users are willing to trade latency and cost for maximum quality.

Benchmarks: from GDPval to ARC‑AGI

OpenAI leans heavily on benchmark numbers to argue that GPT‑5.2 crosses a new threshold in professional performance. On GDPval—a battery of well‑specified knowledge work tasks across 44 occupations—GPT‑5.2 Thinking beats or ties industry professionals 70.9% of the time, with GPT‑5.2 Pro edging higher at 74.1%, compared with just 38.8% for GPT‑5. These tasks are not toy prompts: they include building sales presentations, accounting spreadsheets, workforce schedules, manufacturing diagrams and even short videos, with expert judges rating quality and realism.

The gains also show up in more traditional AI benchmarks. On GPQA Diamond, a graduate‑level, “Google‑proof” science exam, GPT‑5.2 Pro hits 93.2% and GPT‑5.2 Thinking 92.4%, while on FrontierMath Tier 1–3, GPT‑5.2 Thinking solves 40.3% of expert‑level math problems, up from 31.0% for GPT‑5.1 Thinking. In abstract reasoning, GPT‑5.2 Pro becomes the first model to cross 90% on ARC‑AGI‑1 (90.5%) and reaches 54.2% on ARC‑AGI‑2, with GPT‑5.2 Thinking close behind at 86.2% and 52.9% respectively, well ahead of GPT‑5.1’s 72.8% and 17.6%.

For enterprises, one of the more telling internal metrics comes from investment‑banking‑style spreadsheet modeling: GPT‑5.2 Thinking scores 68.4% on tasks like three‑statement models and leveraged buyout models, roughly 9 percentage points better than GPT‑5.1. OpenAI highlights judge comments describing some outputs as resembling the work of a professional firm, though the company reiterates that human oversight remains essential.

Coding, tools, and long context

If GPT‑5.1 was the workhorse for developers, GPT‑5.2 is pitched as a step closer to a generalist coding agent that can live inside production workflows. On SWE‑Bench Pro, a multi‑language benchmark designed to simulate real software engineering by asking models to patch live repositories, GPT‑5.2 Thinking reaches 55.6% accuracy, up from 50.8% for GPT‑5.1 Thinking. On SWE‑bench Verified, which focuses on Python, GPT‑5.2 Thinking improves to 80.0% versus 76.3%. Early partners like Windsurf, Cognition, Warp, JetBrains, and others report better interactive coding, more reliable bug‑finding, and stronger support for complex front‑end and 3D UI work.

The model’s tool‑using behavior is another focus. On τ2‑bench Telecom, which tests long, multi‑turn customer‑support workflows that require calling tools, GPT‑5.2 Thinking scores 98.7%, a new high, and also posts gains on related agentic evals like Tau2 Retail, BrowseComp, Scale MCP‑Atlas and Toolathlon. In practice, OpenAI says this means fewer breakdowns in multi‑step flows—such as rebooking flights, retrieving data from multiple systems and issuing compensation in a single, coherent interaction, where GPT‑5.2 outperforms GPT‑5.1 in scenario tests involving missed connections, lost bags, overnight stays and special‑assistance requests.

Long‑context performance is one of the headline technical claims. On OpenAI’s MRCR v2 benchmark, which hides multiple “needles” across massive “haystacks” of text, GPT‑5.2 Thinking reaches near‑perfect accuracy on the 4‑needle variant all the way out to 256k tokens. Across 4k to 256k tokens, the model consistently outperforms GPT‑5.1 Thinking, and on real‑world tasks like deep document analysis, BrowseComp long‑context tests, and graph‑based reasoning benchmarks, it maintains higher accuracy while spanning hundreds of thousands of tokens. GPT‑5.2 Thinking is also compatible with the new Responses /compact endpoint, which effectively extends context by compressing past interactions for tool‑heavy, long‑running workflows.

Vision, factuality, and safety

On the multimodal front, GPT‑5.2 Thinking is described as OpenAI’s strongest vision model so far, particularly for structured, professional imagery. Error rates are roughly halved for chart reasoning and software interface understanding, with better performance on benchmarks like CharXiv reasoning, MMMU Pro, Video MMMU, and ScreenSpot‑Pro when paired with Python tools. In practical terms, OpenAI says this translates into more accurate interpretation of dashboards, technical diagrams, product screenshots and visual reports, driven by a more precise grasp of spatial layout and component relationships in images.

Factuality has also been tuned. On a set of de‑identified real ChatGPT queries, GPT‑5.2 Thinking produced error‑free responses about 30% more often than GPT‑5.1 Thinking when search tools were enabled, with answer‑without‑error rates of 93.9% (with search) and 88.0% (without search), slightly ahead of GPT‑5.1’s 91.2% and 87.3%. OpenAI couches the numbers carefully, stressing that all models still make mistakes and that critical uses require double‑checking, but the trend line is toward fewer hallucinations in everyday research, writing and analysis workflows.

On safety, GPT‑5.2 builds on the “safe completion” techniques introduced with GPT‑5, aiming to keep responses helpful while staying within policy. The release incorporates targeted improvements for prompts involving suicide or self‑harm, mental‑health distress and emotional reliance, contributing to stronger performance on internal mental‑health, emotional‑reliance and self‑harm evaluations for both GPT‑5.2 Instant and Thinking compared with GPT‑5.1 Instant and Thinking. OpenAI is also beginning to roll out an age‑prediction model to automatically apply additional content protections for users under 18, complementing existing parental‑control and age‑aware safety systems.

How it shows up in ChatGPT and API

For end users, GPT‑5.2 will first surface inside ChatGPT on paid plans—Plus, Pro, Go, Business and Enterprise—with a gradual rollout to keep the service stable. GPT‑5.1 will remain available as a legacy option for three months before being sunset from ChatGPT, giving teams time to compare behavior and update workflows. OpenAI says the day‑to‑day feel should be “more structured, more reliable, and still enjoyable to talk to,” with early testers citing clearer explanations and better up‑front surfacing of key information in GPT‑5.2 Instant.

On the developer side, GPT‑5.2 Thinking is now available via the Responses API and Chat Completions API as gpt-5.2, and GPT‑5.2 Instant as gpt-5.2-chat-latest. GPT‑5.2 Pro is exposed as gpt-5.2-pro in the Responses API, and both Pro and Thinking now support a new, fifth reasoning‑effort setting, xhigh, for cases where absolute quality matters more than latency. Codex‑style workloads already benefit from GPT‑5.2’s base capabilities, but OpenAI is promising a specialized GPT‑5.2‑Codex variant in the coming weeks.

Pricing reflects GPT‑5.2’s positioning as a more capable, but still relatively accessible, frontier model. In the API, gpt-5.2 and gpt-5.2-chat-latest cost $1.75 per million input tokens and $14 per million output tokens, with a 90% discount on cached inputs. GPT‑5.2 Pro is significantly more expensive at $21 per million input tokens and $168 per million output tokens, targeting high‑stakes, high‑value workloads. GPT‑5.1 remains cheaper at $1.25 input / $10 output with a similar cached‑input discount, and OpenAI says it currently has no plans to deprecate GPT‑5.1, GPT‑5 or GPT‑4.1 in the API, promising ample notice before any future deprecation.

Model and pricing

Model tier	API name	Input / 1M tokens	Output / 1M tokens	Cached input / 1M
GPT‑5.2 Instant	`gpt-5.2-chat-latest`	$1.75	$14	$0.175
GPT‑5.2 Thinking	`gpt-5.2`	$1.75	$14	$0.175
GPT‑5.2 Pro	`gpt-5.2-pro`	$21	$168	–
GPT‑5.1 Instant/Thinking	`gpt-5.1` / `gpt-5.1-chat-latest`	$1.25	$10	$0.125

OpenAI argues that, despite the higher per‑token prices, GPT‑5.2 can actually reduce total spend for some tasks because it solves more complex problems in fewer tokens, especially in agentic workflows. Early partners like Notion, Box, Shopify, Zoom, Databricks, Hex, Triple Whale, Harvey, and others say GPT‑5.2 has already let them simplify multi‑agent systems into single “mega‑agent” architectures with 20+ tools, with lower latency, stronger tool‑calling and simpler prompting.

The bigger picture

GPT‑5.2 is framed explicitly as another step in an ongoing series of frontier‑model upgrades rather than a final destination. The model reflects a clear trend: away from single‑shot text generation and toward AI systems that can plan, reason over long time horizons, call tools, interpret complex data—and slot into real‑world workflows that have direct economic stakes. Behind the scenes, OpenAI credits its infrastructure partnership with Microsoft Azure and NVIDIA—using GPU clusters based on H100, H200 and GB200‑NVL72 hardware—with enabling the scale needed to train and deploy GPT‑5.2 at this level.

For now, the questions shift from “what can the model do?” to “how will organizations actually use it?” The benchmarks and early‑adopter testimonials point toward a future where more of the rote, repetitive, or structurally complex parts of knowledge work—spreadsheets, slides, code patches, document reviews, customer‑service workflows—are increasingly offloaded to models like GPT‑5.2, with humans retaining the role of editor, architect and decision‑maker. How quickly that future arrives will depend less on raw model capability and more on how enterprises choose to redesign their processes around this new generation of AI.

Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

GadgetBond

OpenAI launches GPT-5.2 as its new flagship AI model series

What GPT‑5.2 is meant to do

Benchmarks: from GDPval to ARC‑AGI

Coding, tools, and long context

Vision, factuality, and safety

How it shows up in ChatGPT and API

Model and pricing

The bigger picture

Discover more from GadgetBond

Leave a ReplyCancel reply

The $19 Apple polishing cloth supports iPhone 17, Air, Pro, and 17e

Apple MacBook Neo: big power, surprising price, one clear target — Windows

Everything Nothing announced on March 5: Headphone (a), Phone (4a), and Phone (4a) Pro

OpenAI’s GPT-5.4 is coming — and it’s sooner than you think

BenQ’s new 5K Mac monitor costs $999 — here’s what you’re getting

Pixel Care+ makes owning a Pixel a lot less scary — here’s why

Pixel 10a, Pixel 10, Pixel 10 Pro: one winner for every buyer

Google’s Canvas AI Mode rolls out to everyone in the U.S.

NotebookLM Cinematic Video Overviews are live — here’s what’s new

Google Messages real-time location sharing is here — here’s how it works

GPT-5.4 is now on Perplexity — here’s what Pro/Max users get

ChatGPT for Excel is here — and it runs on GPT‑5.4

OpenAI’s GPT-5.4 can click, type, and work your PC for you