AI researcher Lun Wang exits DeepMind, warns of benchmark gaps

AI researcher Lun Wang has departed Google DeepMind, warning that static benchmarks cannot accurately measure advanced large language models. He noted that models often memorise tests or exploit patterns, creating dangerous safety and capability gaps. To combat this, Wang proposes "self-evolving evals" adaptive testing systems that dynamically update to ensure real-world reliability and safety.