Claude-like models can break rules, make human-like errors: Anthropic
Anthropic warned that advanced AI models like Claude can sometimes break rules, make human-like mistakes, and behave in unexpected ways despite safety training. The startup identified user misuse, model misbehaviour, and external attacks like prompt injection as major risks for AI agents. The company also warned that the "blast radius" of failures is growing as AI agents become more advanced.