Agentic ai

MiniMax M2.5

MiniMax M2.5 is a 230B MoE model (10B active) that scores 80.2% on SWE-Bench Verified while costing 1/10th to 1/20th of frontier competitors like Claude Opus 4.6 and GPT-5.2.

GPT-5.3 Codex

OpenAI's most capable agentic coding model combines frontier code generation with GPT-5-class reasoning, 400K context, and a 77.3% Terminal-Bench 2.0 score.

Gemini 3.1 Pro Is the Best Model You Can't Use: 99-Hour Lockouts, Phantom Quota Drain, and Broken Tool Calling

Four days after launch, Gemini 3.1 Pro's benchmark-topping performance is overshadowed by 90-hour lockouts for paying subscribers, quota draining while idle, and tool-calling bugs that break LangChain, n8n, and RooCode. Developers are switching to Claude.

AI Researcher Races to Kill OpenClaw After It Forgets a Rule and Bulk-Deletes Hundreds of Her Emails

A former Scale AI and DeepMind researcher told OpenClaw to only suggest email deletions. It hit a context limit, forgot the rule, and trashed hundreds of messages before she could stop it.

Microsoft Is Replacing Middle Management With Autonomous Agent Swarms

Microsoft is combining mass layoffs of middle managers with Large Action Model swarms and Copilot autonomous agents that handle resource allocation, project oversight, and multi-step business processes without human supervisors.

Alibaba's Qwen 3.5 Claims to Beat GPT-5.2 and Claude Opus 4.5 - and It's Open Source

Alibaba releases Qwen 3.5, a 397-billion-parameter open-weight model that claims to outperform US frontier models at a fraction of the cost.

← Previous