AI CapabilitiesEvidence ReviewJun 13, 2026, 2:10 AM· #6 of 71 in science

AI Models Hit a Wall on 'First Proof,' a Rigorous New Benchmark for Research Mathematics

A coalition of top mathematicians tested leading AI models on unpublished, research-level math problems. The results reveal that while AI excels at standardized tests, it still struggles with autonomous mathematical discovery.

By Factlen Editorial Team

Share this story

First Proof Organizers 40%AI Developers 30%Skeptical Academics 30%

First Proof Organizers: Advocate for strict, contamination-free testing to measure true autonomous reasoning.
AI Developers: Argue that AI's true value lies in human-machine collaboration rather than pure zero-shot autonomy.
Skeptical Academics: View the results as proof that LLMs lack fundamental reasoning and self-correction capabilities.

What's not represented

· Early-career mathematicians who might rely on AI tools to compete with larger research labs.
· Educators concerned about how AI's math capabilities impact university curricula.

Why this matters

As artificial intelligence rapidly integrates into education, research, and the workforce, understanding its true capabilities is critical. This benchmark cuts through industry hype, providing a clear, objective measure of where machine computation ends and human creativity begins.

Up next

Cryo-EM Advance

Laser Phase Plate Breakthrough Illuminates the Smallest Proteins in Human Cells

A revolutionary laser enhancement for cryo-electron microscopes allows scientists to clearly image the 90% of human proteins previously too small to see, unlocking a new era for structural biology and drug discovery.

Every angle. Every day.

Get science stories with full source coverage and perspective breakdowns delivered to your inbox.

Get the briefing →Browse science

AI Models Hit a Wall on 'First Proof,' a Rigorous New Benchmark for Research Mathematics

What's not represented

Laser Phase Plate Breakthrough Illuminates the Smallest Proteins in Human Cells

More in science

How Solar LEDs and Acoustic Pingers Are Solving the Global Bycatch Crisis

How Astronomers Are Using the 'Da Vinci Glow' to Hunt for Alien Oceans

26-Year Mystery in Olympic National Park Solved as DNA Identifies Remains of Joseph Louis Serrao Jr.

Stonehenge's Altar Stone Mystery Points to an Epic Human Journey

Every angle. Every day.