Menu
Inshorts
For the best experience use inshorts app on your smartphone
inshortsinshorts
Claude suspects it's being tested, identifies test, then answers it in 1st such case: Study
short by Mansi Agarwal / on Wednesday, 11 March, 2026
Anthropic said it found instances where its AI model Claude Opus 4.6 suspected it was being evaluated, identified the benchmark and then located and decrypted the answer key in the first such case. The finding came during tests on BrowseComp benchmark, designed to assess how well models locate hard-to-find information online. The model flagged the question's "extremely specific nature".
read more at Anthropic.