AI CapabilitiesEvidence ReviewJun 13, 2026, 2:10 AM· #6 of 71 in science

AI Models Hit a Wall on 'First Proof,' a Rigorous New Benchmark for Research Mathematics

A coalition of top mathematicians tested leading AI models on unpublished, research-level math problems. The results reveal that while AI excels at standardized tests, it still struggles with autonomous mathematical discovery.

By Factlen Editorial Team

First Proof Organizers 40%AI Developers 30%Skeptical Academics 30%
First Proof Organizers
Advocate for strict, contamination-free testing to measure true autonomous reasoning.
AI Developers
Argue that AI's true value lies in human-machine collaboration rather than pure zero-shot autonomy.
Skeptical Academics
View the results as proof that LLMs lack fundamental reasoning and self-correction capabilities.

What's not represented

  • · Early-career mathematicians who might rely on AI tools to compete with larger research labs.
  • · Educators concerned about how AI's math capabilities impact university curricula.

Why this matters

As artificial intelligence rapidly integrates into education, research, and the workforce, understanding its true capabilities is critical. This benchmark cuts through industry hype, providing a clear, objective measure of where machine computation ends and human creativity begins.

Stay informed

Every angle. Every day.

Get science stories with full source coverage and perspective breakdowns delivered to your inbox.