Google DeepMind researcher Lun Wang resigned, highlighting a critical gap in AI evaluation. He argues current methods fail to anticipate new capabilities in evolving models, leading to silent failures. Wang stresses the need for “self-evolving evaluations” to predict and adapt to future AI advancements.