Reasoning Benchmarks: GPQA, AIME, and Humanity's Last ExamRankings of AI models on the hardest reasoning benchmarks available: GPQA Diamond, AIME competition math, and the notoriously difficult Humanity's Last Exam.