Models

GPT-5.3 Codex

GPT-5.3 Codex

OpenAI's most capable agentic coding model combines frontier code generation with GPT-5-class reasoning, 400K context, and a 77.3% Terminal-Bench 2.0 score.

Gemini 3.1 Pro

Gemini 3.1 Pro

Google DeepMind's Gemini 3.1 Pro leads on 13 of 16 benchmarks with 77.1% ARC-AGI-2, 94.3% GPQA Diamond, and a 1M-token context window at $2/M input.