I compared my daughter against SOTA models on math puzzles (michalprzadka.com)
I created an AI math reasoning benchmark using puzzles from this year’s GMIL competition — a long-running international mathematical challenge that I participated in myself back in 1998. The results are quite interesting: some of the most advanced AI models performed comparably to my 11-year-old daughter, while others struggled significantly. This experiment gives some amusing insights into current AI capabilities in mathematical reasoning, especially when compared to human performance at the middle school level.