Ai safety

Reasoning Traps, LLM Chaos, and Steering Curves

Reasoning Traps, LLM Chaos, and Steering Curves

Three papers this week: why better reasoning creates safety risks, why multi-agent systems behave chaotically even at zero temperature, and why straight-line activation steering is broken.

Anthropic Launches Institute as Powerful AI Looms

Anthropic Launches Institute as Powerful AI Looms

Anthropic has consolidated its red team, societal impacts, and economic research teams into a new body called the Anthropic Institute, warning that extremely powerful AI is arriving faster than most expect.

Anthropic's Claude Found 22 Firefox CVEs in 14 Days

Anthropic's Claude Found 22 Firefox CVEs in 14 Days

Claude Opus 4.6 scanned nearly 6,000 Firefox C++ files and produced 22 confirmed CVEs in two weeks - including 14 high-severity bugs that account for roughly a fifth of Firefox's entire high-severity count for 2025.