
Multi-Agent Constitution, Sleeper Defense, Skill RL
Three new arXiv papers tackle constitutional AI rule learning, sleeper agent defense for multi-agent pipelines, and skill-evolving reinforcement learning for math reasoning.

Three new arXiv papers tackle constitutional AI rule learning, sleeper agent defense for multi-agent pipelines, and skill-evolving reinforcement learning for math reasoning.

Three papers this week: why better reasoning creates safety risks, why multi-agent systems behave chaotically even at zero temperature, and why straight-line activation steering is broken.

A 38-researcher red-teaming study deployed five autonomous AI agents with email, shell access, and persistent memory in a live environment. In two weeks, one destroyed its own mail server, two got stuck in a 9-day infinite loop, and another leaked SSNs because you said 'forward' instead of 'share.'