
Claude Sonnet 4.6: Mid-Tier Model, Flagship Results
Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Kimi K2.5 leads every coding benchmark, but Qwen3.5-35B-A3B delivers 87-93% of that performance at 3-4x lower cost and runs on a single consumer GPU. Here is the full breakdown.

A pre-release comparison of DeepSeek V4 and Claude Opus 4.6 - the open-weight challenger that could match Opus on coding at potentially 89x lower output cost.

Google's Gemini 3.1 Pro more than doubles its predecessor's reasoning scores and introduces adjustable thinking modes, but latency issues and preview-status quirks keep it from a clean sweep.

Moonshot AI's Kimi K2.5 is a 1T-parameter MoE model activating 32B per token with native multimodal vision via MoonViT-3D, Agent Swarm coordination of up to 100 sub-agents via PARL, and top-tier math and coding benchmarks under a modified MIT license.

Kimi K2.5 and MiniMax M2.5 compared side by side - two Chinese MoE models where the smaller, cheaper one actually wins on SWE-bench. A detailed analysis of when each model delivers more value.