OpenAI just pulled off something straight out of a sci‑fi novel: an unreleased experimental reasoning LLM scored at the “gold medal level” on the 2025 International Mathematical Olympiad (IMO), tackling problems that have flummoxed brilliant human minds for decades.
From July 10–11, 2025, the world’s brightest high‑school mathematicians gathered in Sofia, Bulgaria, to face six grueling proof‑based problems over two consecutive four‑and‑a‑half‑hour exams. Only the top ~10 percent of the 630 contestants—67 students—earn gold medals each year. This summer, an experimental OpenAI model joined their ranks, solving five out of six problems correctly and racking up 35 out of 42 possible points under the same conditions as the human participants.
On July 19, Alexander Wei, a research scientist at OpenAI, took to X (formerly Twitter) to drop the bombshell:
I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Wei and his team stressed that the model had no internet access, no external calculators or symbolic engines—just pure, in‑model reasoning. Noam Brown, another OpenAI researcher, noted the model’s “new level of sustained creative thinking,” a key ingredient for cracking IMO‑style proofs.
Traditionally, AI has excelled at rote tasks—sorting data, spotting patterns, churning through repetitive computations. But proof‑style math problems demand insight, creativity, and multi‑step logical rigor. Past systems like DeepMind’s AlphaGeometry tackled geometry subproblems by combining symbolic engines with LLMs; it managed to solve 25 of 30 geometry questions from previous IMOs but relied on handcrafted inference rules. OpenAI’s model, by contrast, is a general‑purpose reasoning engine, not a specialized math system, marking a big leap toward “general intelligence” instead of domain‑specific skillsets.
Sam Altman, OpenAI’s CEO, chimed in on X that although GPT‑5 is on the horizon, it won’t ship with this gold‑level math capability—that’s slated for “many months” down the road. In other words, the public-facing models we’ll see soon are stepping stones toward this breakthrough, but the IMO‑crushing version remains under wraps for now.
No triumph is without doubters. Fields Medalist Terence Tao, speaking on Lex Fridman’s podcast in June, had pegged IMO‑level AI performance several years out, urging researchers to aim lower for now. And AI critic Gary Marcus applauded the result as “genuinely impressive,” while urging transparency on training data, cost per problem, and real‑world utility—and noting that independent verification of the model’s scores is still pending.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
