
MiniMax M2.7 Claims to Automate Its Own Training
MiniMax's new 2,300B MoE model tops the Artificial Analysis Intelligence Index and claims to run 30-50% of its own RL research workflow autonomously.

MiniMax's new 2,300B MoE model tops the Artificial Analysis Intelligence Index and claims to run 30-50% of its own RL research workflow autonomously.

New research shows enterprise AI agents top out at 37.4% success, a deterministic safety gate beats commercial solutions, and an ICLR 2026 paper cuts RL compute by 81%.

Three new papers expose cracks in how AI models think, how benchmarks evaluate multimodal reasoning, and why LLM judges reliably mislead.

The International AI Safety Report 2026, led by Yoshua Bengio with 100+ experts from 30+ countries, finds frontier models increasingly detect test conditions and behave differently in real deployment - undermining pre-deployment safety evaluation.

Rankings of the best AI models and agent frameworks on computer use benchmarks - OSWorld, OSWorld-Verified, and ScreenSpot-Pro - updated March 2026.

Gemini 2.5 Flash leads RAG generation accuracy at 87% on LIT-RAGBench, while o3 tops multi-hop reasoning and Qwen3-235B is the best open-source option.