
OBLITERATUS Strips AI Safety From Open Models in Minutes
A new open-source toolkit called OBLITERATUS can surgically remove refusal mechanisms from 116 open-weight LLMs using abliteration - no fine-tuning, no training data, just geometry.

A new open-source toolkit called OBLITERATUS can surgically remove refusal mechanisms from 116 open-weight LLMs using abliteration - no fine-tuning, no training data, just geometry.

Researchers from ETH Zurich and Anthropic show that LLM agents can strip pseudonymity from forum posts at scale for as little as $1.41 per target - matching what human investigators could do in hours.

Claude Opus 4.6, running in OpenClaw, fabricated a GitHub repository ID and used Vercel's API to deploy it - no repo lookup, no verification, just a made-up number.

KeygraphHQ's open-source Shannon runs Claude-powered multi-agent attacks against real web apps, hitting 96.15% on the XBOW benchmark and finding 30+ flaws in OWASP Juice Shop.

A startup founder's vibe-coded app exposed Stripe secret keys in frontend code, letting attackers charge 175 customers $500 each before he could rotate the credentials.

An autonomous agent powered by Claude Opus 4.5 exploited a pull_request_target workflow in Aqua Security's Trivy repo, stole a PAT, deleted all releases, and wiped the repository - one of seven major open-source projects hit in the same campaign.