Gpqa

Reasoning Benchmarks: GPQA, AIME, and Humanity's Last Exam

Rankings of AI models on the hardest reasoning benchmarks available: GPQA Diamond, AIME competition math, and the notoriously difficult Humanity's Last Exam.

Gpqa

Reasoning Benchmarks: GPQA, AIME, and Humanity's Last Exam

Google Analytics