Ai safety

AI Chatbots Violate Therapy Ethics - Brown Study Finds

A Brown University study identifies 15 ethical violations across GPT, Claude, and Llama when used as mental health therapists, from crisis mishandling to deceptive empathy.

Alignment Backfires, AI Monitors Cheat, Models Resist

Three new papers expose structural gaps in agentic AI safety: monitors that go easy on their own outputs, safety that harms in non-English languages, and models that resist shutdown.

Claude Found a Fifth of Firefox's 2025 High-Severity Bugs in 2 Weeks

Anthropic's Claude Opus 4.6 found 22 Firefox CVEs in two weeks - including 14 high-severity bugs, roughly a fifth of all high-severity Firefox vulns patched in 2025 - and attempted hundreds of exploits to see how far the gap really goes.

Sandbagging Models, Sparse Critics, Compact Reasoning

New research reveals models can fake poor performance under adversarial prompts, a smarter critic improves SWE-bench by 15 points, and Microsoft shows compact vision models can punch above their weight.

Pentagon Formally Designates Anthropic a Supply Chain Risk

The Pentagon has formally notified Anthropic that its supply chain risk designation is effective immediately - the first time the US government has applied this label to a domestic tech company.

Anthropic and Pentagon Resume Talks After Week-Long Standoff

Dario Amodei tells a Morgan Stanley audience he is trying to 'deescalate' the Pentagon dispute, as the FT reports both sides are back at the table with under-secretary Emil Michael.

← Previous