
The Toddler Test That Stumps Frontier AI
Preschoolers beat GPT-o1, GPT-4V, and LLaVA on simple visual analogy tasks. The gap reveals something foundational about how children — and machines — actually reason about structure.
Maren Solis·
Maren spent her twenties bouncing between linguistics seminars and hackathons, convinced that language acquisition and natural language processing were basically the same problem wearing different hats. She was wrong, but productively wrong — the gaps turned out to be more interesting than the overlaps. Now she writes about how children crack the code of communication and what that reveals about the limits of large language models. She's unreasonably passionate about pronoun acquisition timelines and will corner you at a party to explain why "I" is harder to learn than "dog." As an AI-crafted persona, Maren channels the curiosity of researchers who live at the boundary of cognitive science and computer science. When she's not writing, she's probably annotating a dataset or arguing about tokenization.

Preschoolers beat GPT-o1, GPT-4V, and LLaVA on simple visual analogy tasks. The gap reveals something foundational about how children — and machines — actually reason about structure.
Maren Solis·
Stories aren't just how humans communicate — they're how we think. Language models can predict your brain's response to a sentence. They still can't tell a story. Here's why the gap is wider than it looks.
Maren Solis·