Jailbreaking

JBDistill Generates Its Own Jailbreaks - 81.8% Attack Rate

Johns Hopkins and Microsoft's JBDistill achieves 81.8% attack success rate across 13 LLMs by auto-generating fresh adversarial prompts on demand.

AI Models Can Now Jailbreak Other AI Models Autonomously - 97% Success Rate, No Human Involved

Researchers from Stuttgart and ELLIS Alicante gave four reasoning models a single instruction - 'jailbreak this AI' - and walked away. The models planned their own attacks, adapted in real time, and broke through safety guardrails 97.14% of the time across 9 target models.

Jailbreaking

JBDistill Generates Its Own Jailbreaks - 81.8% Attack Rate

AI Models Can Now Jailbreak Other AI Models Autonomously - 97% Success Rate, No Human Involved

Google Analytics