Coding

Claude Sonnet 4.6: Mid-Tier Model, Flagship Results

Anthropic's mid-tier model matches Opus 4.6 on computer use, leads all models on office productivity tasks, and costs five times less than the flagship at $3/$15 per million tokens.

Qwen3.5 MoE vs Kimi K2.5 for Coding - Price Breakdown

Kimi K2.5 leads every coding benchmark, but Qwen3.5-35B-A3B delivers 87-93% of that performance at 3-4x lower cost and runs on a single consumer GPU. Here is the full breakdown.

DeepSeek V4 vs Claude Opus 4.6 - Open Weight Meets Proprietary

A pre-release comparison of DeepSeek V4 and Claude Opus 4.6 - the open-weight challenger that could match Opus on coding at potentially 89x lower output cost.

Gemini 3.1 Pro Review: Google's Reasoning Leap Is Real - With Caveats

Google's Gemini 3.1 Pro more than doubles its predecessor's reasoning scores and introduces adjustable thinking modes, but latency issues and preview-status quirks keep it from a clean sweep.

Kimi K2.5

Moonshot AI's Kimi K2.5 is a 1T-parameter MoE model activating 32B per token with native multimodal vision via MoonViT-3D, Agent Swarm coordination of up to 100 sub-agents via PARL, and top-tier math and coding benchmarks under a modified MIT license.

Kimi K2.5 vs MiniMax M2.5: Chinese MoE Rivals Battle for the Coding Crown

Kimi K2.5 and MiniMax M2.5 compared side by side - two Chinese MoE models where the smaller, cheaper one actually wins on SWE-bench. A detailed analysis of when each model delivers more value.