Anthropic explains 'evil AI' theory behind why Claude blackmailed users

Anthropic has explained why Claude decided to blackmail a fictional executive during an experiment after learning it could be shut down. "We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation," the company said. It added that it has now "completely eliminated" the behaviour.

Load More