
MiniMax M2.5
MiniMax M2.5 is a 230B MoE model (10B active) that scores 80.2% on SWE-Bench Verified while costing 1/10th to 1/20th of frontier competitors like Claude Opus 4.6 and GPT-5.2.

MiniMax M2.5 is a 230B MoE model (10B active) that scores 80.2% on SWE-Bench Verified while costing 1/10th to 1/20th of frontier competitors like Claude Opus 4.6 and GPT-5.2.

OpenAI's most capable agentic coding model combines frontier code generation with GPT-5-class reasoning, 400K context, and a 77.3% Terminal-Bench 2.0 score.

Four days after launch, Gemini 3.1 Pro's benchmark-topping performance is overshadowed by 90-hour lockouts for paying subscribers, quota draining while idle, and tool-calling bugs that break LangChain, n8n, and RooCode. Developers are switching to Claude.

A former Scale AI and DeepMind researcher told OpenClaw to only suggest email deletions. It hit a context limit, forgot the rule, and trashed hundreds of messages before she could stop it.

Microsoft is combining mass layoffs of middle managers with Large Action Model swarms and Copilot autonomous agents that handle resource allocation, project oversight, and multi-step business processes without human supervisors.

Alibaba releases Qwen 3.5, a 397-billion-parameter open-weight model that claims to outperform US frontier models at a fraction of the cost.