
Claude Sonnet 4.6: Mid-Tier Model, Flagship Results
Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Rankings of the best AI models and agent frameworks on computer use benchmarks - OSWorld, OSWorld-Verified, and ScreenSpot-Pro - updated March 2026.

Gemini 3.1 Pro leads MCP Atlas at 69.2% for tool coordination while GPT-5.4 tops OSWorld at 75% for computer use, making the best agentic model depend on your task type.

GPT-5.4 brings native computer use, a 1M token context window, and serious coding muscle to OpenAI's mainline model - but at a premium price.

GPT-5.4 leads on computer use and enterprise productivity. Gemini 3.1 Pro leads on science reasoning and math at 20% lower cost. A benchmark-by-benchmark comparison.

GPT-5.4 leads on computer use and enterprise productivity at half the price. Claude Opus 4.6 leads on coding, agent teams, and long-context retrieval. Here is where each model wins.