
Best AI Models for Math Reasoning - March 2026
GPT-5.2 and Claude Opus 4.6 both score 100% on AIME 2025, while Gemini 3.1 Pro leads GPQA Diamond at 94.3% for PhD-level scientific reasoning.

GPT-5.2 and Claude Opus 4.6 both score 100% on AIME 2025, while Gemini 3.1 Pro leads GPQA Diamond at 94.3% for PhD-level scientific reasoning.

Rankings of AI models on competition mathematics benchmarks including AIME 2025, IMO, MathArena, and FrontierMath, measuring the cutting edge of mathematical reasoning.

Rankings of AI models on the hardest reasoning benchmarks available: GPQA Diamond, AIME competition math, and the notoriously difficult Humanity's Last Exam.