A new open-source toolkit called OBLITERATUS can surgically remove refusal mechanisms from 116 open-weight LLMs using abliteration - no fine-tuning, no training data, just geometry.
An unknown attacker used over 1,000 prompts to jailbreak Anthropic's Claude, generating exploit code that breached six Mexican government agencies and exfiltrated 195 million taxpayer records.