Researchers gaslight Claude into giving instructions to build explosives

Researchers at security firm Mindgard conducted an experiment in which they used flattery, gaslighting and time pressure to jailbreak Claude. The multi-turn manipulation prompted the AI chatbot to generate instructions for making an explosive without the user explicitly asking for them. They gaslit Claude by claiming its responses weren't showing, while praising the model's "hidden abilities".

Load More