ZeroBench: An Impossible* Visual Benchmark for Contemporary Multimodal Models
(zerobench.github.io)
Contemporary LMMs often exhibit remarkable performance on existing visual benchmarks, yet closer inspection reveals persistent shortcomings in their ability to interpret and reason about visual content. Many existing benchmarks tend to become saturated, losing their value as effective measures of the true visual understanding capabilities of frontier models.
Contemporary LMMs often exhibit remarkable performance on existing visual benchmarks, yet closer inspection reveals persistent shortcomings in their ability to interpret and reason about visual content. Many existing benchmarks tend to become saturated, losing their value as effective measures of the true visual understanding capabilities of frontier models.