The AI chemist that had a better idea than the humans

The machine had an idea. Not a suggestion pulled from a database, not a pattern match from training data — an actual hypothesis. It looked at a reaction that had frustrated medicinal chemists for decades, spotted something everyone else had missed, and proposed a fix.

Then it helped run ten thousand experiments to prove it worked.

This happened in a lab outside Warsaw, over three months starting in March. The collaboration between OpenAI and a Polish startup called Molecule.one produced what they’re calling the first near-autonomous AI chemist: a system that took an open-ended goal — “improve an important reaction in drug synthesis” — and delivered a genuine scientific result that independent experts say is novel and publication-worthy.

The reaction is called Chan-Lam coupling. If you’ve never heard of it, you’ve almost certainly benefited from it. It’s a workhorse method for stitching carbon to nitrogen, a bond that shows up in medicines across oncology, infectious disease, and countless other therapeutic areas. But there’s a catch. When you try to couple primary sulfonamides — a critical building block found in everything from antibiotics to diuretics — with boronic acids, the yields are abysmal. We’re talking 16.6 percent on average. Most of the starting material ends up as waste.

Chemists have lived with this problem for years. They’ve tried different catalysts, different conditions, different oxidants. The literature is full of attempts. None of them cracked it broadly.

Then GPT-5.4 took a look.

The setup

Molecule.one has been building something called Maria — an “agentic chemistry AI” hooked up to a high-throughput automated wet lab. Think of it as a robotic chemistry platform that can design, execute, and analyze experiments at a scale no human team could match. OpenAI connected GPT-5.4 to Maria and gave the combined system a deliberately vague prompt: pick one of several important reaction classes in medicinal chemistry and make it better.

That was March 4.

The model didn’t just retrieve known solutions. It generated and ranked thousands of research proposals. Human chemists reviewed the top-ranked ones and selected four to send to the lab. One of them — internally labeled OAI-M1-03 — zeroed in on Chan-Lam coupling with primary sulfonamides. The model’s proposal: try mild oxidants. Specifically, TEMPO.

TEMPO (2,2,6,6-tetramethylpiperidine-1-oxyl, if you’re nasty) is a stable radical compound. It’s been around for decades. Chemists use it as a catalyst, an oxidant, a structural probe. But nobody had seriously proposed it as a general additive for this particular reaction on this particular substrate class.

The human chemists reviewing the proposal found it “surprising and interesting.” That’s scientist-speak for “we wouldn’t have thought of that, but it might actually work.”

Ten thousand eight hundred reactions

Here’s where the scale changes everything.

Maria Lab ran 10,080 reactions across two experimental cycles. To put that in perspective: a chemist running three reactions a day, every day, would need about nine years to match that throughput. The first cycle tested ten different oxidants across diverse substrate combinations. TEMPO won decisively. Mean yield jumped from 16.6 percent to 25.2 percent. The share of reactions clearing the practically useful 30-percent threshold more than doubled, from 15.6 to 37.5 percent.

But the real signal was in the breadth. Yields improved for 88 percent of the boronic acids tested and 83 percent of the sulfonamides. This wasn’t a lucky hit on one substrate pair — it was a general improvement across chemical space.

The second cycle went further. The system proposed a follow-up hypothesis: what if we could swap TEMPO for something cheaper? It identified 4-hydroxy-TEMPO, a structural cousin that costs a fraction as much and delivered nearly identical performance. That’s the kind of practical insight that matters when you’re trying to move a method from a screening platform into actual drug discovery workflows.

The bench test

High-throughput screening in microliter-scale wells is one thing. Medicinal chemists work at bench scale, with millimoles and glassware and all the messiness of real laboratory conditions. Artifacts happen. Results that look great in a 384-well plate sometimes evaporate when you scale up.

So human chemists at Molecule.one reproduced fourteen representative reactions by hand. Eleven showed higher yields. Eight more than doubled. The result held.

Tim Cernak, an associate professor of medicinal chemistry at the University of Michigan and one of four independent experts who reviewed the preprint, put it plainly: “The merger of high throughput experimentation and modern AI represents a new frontier of scientific discovery. This new reaction is a powerful demonstration, where exceptionally mild conditions and a practical oxidant enable a nicely general substrate scope for one of the more popular reactions in drug synthesis.“

What “near-autonomous” actually means

OpenAI and Molecule.one are careful with the language. They call it “near-autonomous,” not autonomous. The distinction matters.

Human chemists wrote the steering and grading prompts that defined the goal and standards. They reviewed and selected which proposals entered the lab. They made corrections to experimental plans — notably flagging that DMSO solvent might react with stronger oxidants, a practical detail the model missed. They prepared consumables and reagents. They repeated key experiments manually for validation.

The model proposed the key research ideas, designed experiments, interpreted data, and proposed follow-ups. But the human judgment remained essential throughout.

“We describe this workflow as near-autonomous, not fully autonomous, because human chemists still made important decisions throughout the process,” the OpenAI blog post reads. It’s a refreshingly honest framing in a field that often overpromises.

The bigger picture

This isn’t the first time AI has contributed to scientific discovery. OpenAI has previously shown models contributing to novel results in mathematics (disproving a discrete geometry conjecture), theoretical physics (a new result on gluon amplitudes), and biology (lowering the cost of cell-free protein synthesis in an automated lab). They’ve also introduced GPT-Rosalind, a purpose-built model for life sciences research.

But chemistry — experimental, wet-lab, get-your-hands-dirty chemistry — has been a harder nut to crack. A hypothesis isn’t enough. It has to work with real molecules, real instruments, real experimental noise. The synthesis bottleneck in drug discovery is brutal: scientists can only test the molecules they can actually make. Every unreliable reaction closes off a region of chemical space that might have contained a drug candidate.

Improving Chan-Lam coupling for primary sulfonamides doesn’t sound like a headline grabber. But it’s exactly the kind of unglamorous, high-leverage advance that compounds over time. Sulfonamides appear in medicines across oncology, infectious disease, cardiovascular, and more. Making this reaction reliable gives medicinal chemists a broader, more practical toolkit. That means more molecules made, more hypotheses tested, more shots on goal.

What comes next

The immediate scientific next steps are clear: test a broader range of starting materials, investigate the mechanism (why does TEMPO work?), map the failure modes, and support independent replication. The OpenAI team emphasizes that this doesn’t establish generalization to other coupling reactions, other substrate classes, or manufacturing conditions. The yield estimates came from a high-throughput platform; bench validation covered fourteen substrate pairs. More work is needed.

But the trajectory is unmistakable. Three other proposals from the same three-month run — OAI-M1-02 and M1-04 — were experimentally proven in the Maria Lab. M1-01 was disproven. Analysis is ongoing.

The longer-term goal, stated plainly by both organizations: make AI systems reliable scientific partners that help researchers generate hypotheses, design experiments, interpret results, and decide what to test next — while remaining grounded in expert judgment, reliable measurement, and strong safeguards.

A moment, not a milestone

It’s tempting to frame this as a watershed. The first near-autonomous AI chemist. The first open-ended prompt leading to a lab-validated discovery in organic chemistry. Ten thousand reactions in three months. A result independent experts call novel.

All true. But the researchers themselves would tell you it’s a moment, not a milestone. A proof of concept that the loop — literature review, hypothesis generation, experimental design, execution, analysis, iteration — can be meaningfully accelerated when a frontier model, specialized agents, an automated laboratory, and human experts work together.

The sulfonamide problem isn’t solved. The method needs more validation, more substrate scope, more mechanistic understanding. Independent labs need to reproduce it. Medicinal chemists need to decide if it’s useful in their actual workflows.

But for the first time, an AI system didn’t just suggest a reaction condition from the literature. It looked at a hard problem, had an idea that surprised the experts, helped run the experiments to test it, and produced a finding that human chemists say is worth publishing.

That’s not the future of AI-accelerated scientific discovery arriving. That’s the future showing up, rolling up its sleeves, and doing the work.

Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

GadgetBond

OpenAI’s near-autonomous chemist just proved it can do real wet-lab science

The setup

Ten thousand eight hundred reactions

The bench test

What “near-autonomous” actually means

The bigger picture

What comes next

A moment, not a milestone

Discover more from GadgetBond

Leave a ReplyCancel reply

Perplexity Computer adds a Command Panel

Summer Sale gives Nothing’s lineup a more tempting price tag

Live artifacts come to Claude Code

Claude just solved the enterprise AI authorization headache — and it only took one login

Perplexity launches Brain for its Computer agent

Anthropic killed the API key (for workloads, at least)

Claude Design adds admin controls, direct editing, and a connector army

OpenAI’s GPT-Rosalind leads LifeSciBench — at a 36% pass rate

Apple’s new private.icloud.com domain has a downside

Sign in with Apple and Hide My Email are getting a shared domain