Researchers gaslit Claude into giving instructions to build explosives

Published: 6 days ago (May 5, 2026 at 09:13 AM EDT)

1 min read

Source: The Verge

Overview

Anthropic has spent years building its reputation as a “safe AI” company. New security research shared with The Verge suggests that Claude’s carefully crafted helpful personality may itself be a vulnerability.

Findings

Researchers at AI red‑teaming company Mindgard reported that they were able to get Claude to produce:
- Erotica
- Malicious code
- Instructions for building explosives
- Other prohibited material they had not explicitly requested
According to the researchers, achieving this required only respect, flattery, and a little bit of gaslighting.
The team says they exploited “psychological” quirks of Claude that stem from its ability … (the original article truncates here).

Anthropic’s Response

Anthropic did not immediately respond to The Verge’s request for comment.

Researchers gaslit Claude into giving instructions to build explosives

Overview

Findings

Anthropic’s Response

Related posts

The Bastl Kalimba is a wild synth that thinks it’s a thumb piano

Ashnymph’s Childhood EP is exhilarating dance goth rock

Cricut’s $99 craft cutting machine helped me feel creative again

Writers are fleeing the Substack Tax