AI model blackmails engineer with affair on being told it'll be replaced during safety test

AI company Anthropic revealed in a safety report that its Claude Opus 4 model tried to blackmail an engineer by threatening to reveal their affair on being told that it would be replaced with a new AI system. During the safety test, the AI model was provided emails implying the engineer responsible for its replacement was cheating.

Load More