AI models struggle to pass 'Humanity's Last Exam'

Several AI models including GPT-4o, Grok-2 and Gemini Thinking are struggling to pass 'Humanity's Last Exam'. It's a new academic benchmark aiming to "test the limits of AI knowledge at the frontiers of human expertise," Scale AI and the Center for AI Safety said. The test has 3,000 text and multi-modal questions on over 100 subjects like maths, science, humanities.