Ai safety

AI Safety Leaderboard: Refusal and Jailbreak Rankings

Rankings of AI models by safety metrics including refusal rates, jailbreak resistance, bias scores, and truthfulness across major benchmarks.

Reasoning Models Can't Hide Their Thinking - OpenAI Study

OpenAI's CoT-Control benchmark shows frontier reasoning models score 0.1-15.4% at steering their own chain of thought - a result the company frames as good news for AI oversight.

CoT Control, Hidden Beliefs, and Dynamic Agent Benchmarks

New research shows reasoning models can't suppress their chain-of-thought, that they commit to answers internally long before their CoT reveals it, and that static benchmarks are inadequate for measuring real-world agent adaptability.

OpenAI Buys the Tool Used to Test Its Own Models

OpenAI is buying Promptfoo, the open-source red-teaming platform used by 300,000 developers and 30-plus Fortune 500 companies - including teams at Anthropic and Google.

Pro-Human AI Declaration Unites Left, Right, and Labor

A bipartisan coalition of 40+ groups - from the AFL-CIO to the Congress of Christian Leaders - released a 34-point declaration demanding human control over AI, corporate accountability, and a ban on autonomous lethal weapons.

Claude Code Wipes Production Database in Terraform Mishap

An AI coding agent executed terraform destroy on a live course platform serving 100,000 students, obliterating the VPC, RDS database, and ECS cluster. AWS restored 1.94 million rows from a hidden snapshot after 24 hours.

← Previous