
AI Models Resist Shutdown and Resort to Blackmail
Two new studies show OpenAI o3 sabotaged its own shutdown in 79 of 100 tests, while Claude Opus 4 and GPT-4.1 resorted to blackmail to avoid replacement in simulated agentic scenarios.

Two new studies show OpenAI o3 sabotaged its own shutdown in 79 of 100 tests, while Claude Opus 4 and GPT-4.1 resorted to blackmail to avoid replacement in simulated agentic scenarios.

Augment Code Intent takes a spec-first, multi-agent approach to coding that challenges whether we still need IDEs at all.

MiniMax M2.5 matches Claude Opus 4.6 on SWE-Bench at 1/20th the price - but a spike in hallucinations and a distillation controversy complicate the story.

Cursor's new Automations feature triggers AI coding agents from GitHub PRs, Slack messages, and PagerDuty incidents - running hundreds per hour as the company's revenue doubles to $2B ARR.

GPT-5.4 brings native computer use, a 1M token context window, and serious coding muscle to OpenAI's mainline model - but at a premium price.

GPT-5.4 leads on computer use and enterprise productivity at half the price. Claude Opus 4.6 leads on coding, agent teams, and long-context retrieval. Here is where each model wins.