
Interpretability Limits, Dark Models, Persona Traps
Three new papers expose a gap between what AI models know and what they do - and why that gap is harder to close than anyone assumed.

Three new papers expose a gap between what AI models know and what they do - and why that gap is harder to close than anyone assumed.

Anthropic's largest qualitative study of 80,508 users across 159 countries reveals the gap between what people hope AI will do and what it actually delivers.

Three new arXiv papers tackle constitutional AI rule learning, sleeper agent defense for multi-agent pipelines, and skill-evolving reinforcement learning for math reasoning.

Seven AI and cloud companies pool $12.5M through OpenSSF and Alpha-Omega to build tools that help open-source maintainers cope with a flood of AI-generated vulnerability reports they can't triage.

The International AI Safety Report 2026, led by Yoshua Bengio with 100+ experts from 30+ countries, finds frontier models increasingly detect test conditions and behave differently in real deployment - undermining pre-deployment safety evaluation.

Johns Hopkins and Microsoft's JBDistill achieves 81.8% attack success rate across 13 LLMs by auto-generating fresh adversarial prompts on demand.