Kids Compress. AI Memorizes. That's the Whole Problem.

Here's a fact that should bother you more than it does.

Your three-year-old niece sees a Chihuahua, a Great Dane, and a scruffy mutt in the park. Three data points. From that moment on, she can identify virtually any dog she'll ever encounter — including dogs that look nothing like those three, including CGI dogs, cartoon dogs, a dog in a Halloween costume.

Meanwhile, the most sophisticated neural networks in existence — trained on hundreds of millions of labeled images — still fail on certain generalization tests that any preschooler nails without thinking. They get thrown off by odd lighting. They recognize "golden retriever" but fumble when you put one in an unusual context. They confuse objects when rotated slightly.

The question is: why?

The obvious answer is "toddlers are secretly smarter than AI." That's satisfying but not actually illuminating. The more interesting answer — the one recent research is converging on — is that children generalize better in part because their brains have fewer resources.

Cognitive constraints, paradoxically, produce better abstraction.

The Compression Hypothesis

Here's the core idea. When you have unlimited capacity, you can afford to memorize. When you're a child with a small, metabolically expensive brain and thousands of things to learn and a lifetime of decisions to make, you can't. You're forced to find the rule underneath the examples, not the examples themselves.

A 2025 Nature Communications study tested this directly. Schulz and colleagues looked at human generalization in reinforcement learning tasks — how well people transfer what they've learned to new situations. Standard RL models failed to explain human performance. But when the researchers added an information-theoretic constraint — essentially a penalty for complex representations — the models started matching human behavior. The systems that were forced to compress generalized the way humans do (Schulz et al., 2025).

The finding is almost annoyingly elegant. Limited cognitive resources pressure the brain into discovering abstract, reusable representations — and those representations generalize in ways that rote memorization never can. The constraint is the point.

Rules, Not Statistics

There's a related idea, and it comes from how humans learn abstract rules.

Imagine I show you three number sequences: 2, 4, 6; then 1, 3, 5; then 10, 12, 14. You immediately think: add 2 each time. You don't enumerate every sequence you've seen. You find the shortest program that describes all of them, then apply it to new cases.

Rule and colleagues put this intuition to a rigorous test. They built a model called MAPS — Metaprogram Search — that finds abstract rules by searching for the most compact symbolic description that accounts for all examples. MAPS didn't just learn rules more efficiently than neural networks. It also predicted human errors with striking accuracy (Rule et al., 2024).

That last part is the key. MAPS made the same mistakes people did. Which suggests it's not just producing the same outputs — it's following a similar process. Human rule-learning, they argue, is fundamentally a program induction problem: find the smallest program that explains the data, then run that program on new inputs.

Neural networks, in contrast, learn weighted associations between features. They can approximate rules incredibly well. But they're fitting curves, not searching for compact programs. And when you're fitting curves, you need a lot of points.

Where AI Is Closing the Gap

It would be unfair not to mention where AI is actually making progress here.

Treger and Ullman (2025) at the Weizmann Institute tried something clever: what if AI learned in the same sequence that infants do? Infants don't learn "random human behavior" first — they learn animacy before goals, goals before complex social predictions. There's a structured developmental curriculum. When Treger and Ullman trained AI models to follow that same sequence, training efficiency jumped dramatically, as did generalization to novel actors and scenarios (Treger & Ullman, 2025). The AI wasn't smarter. It was learning smarter, because it was following an infant's roadmap.

Similarly, a landmark 2024 Science paper from NYU showed that a neural network trained on just 61 hours of a single child's head-mounted camera footage could learn to map words to their visual referents and generalize to new instances — mirroring key features of early word learning (Vong et al., 2024). Sixty-one hours. Not billions of labeled images. The difference was grounding: messy, first-person, continuous visual experience from inside an actual developing child's perceptual world.

Neither result means "AI now generalizes like a toddler." But both suggest the data efficiency gap isn't written in stone. Change the training signal — add structure, add grounding — and the gap shrinks considerably.

The Part That Doesn't Easily Fix

Here's where I'm going to be the buzzkill at the table.

Even if AI generalization is improving, there's a deeper problem that the efficiency tricks don't fully address: symbol grounding.

Mahowald (2024), writing in MIT's Open Mind, makes the argument directly. Large language models learn statistical correlations between symbols. They're extraordinarily good at this. But they don't have the organic grounding that comes from having a body, a developmental history, and physical engagement with the world. When a child learns the word "heavy," it's tied to muscle tension, to things falling, to the effort of lifting. When an LLM learns "heavy," it's tied to contexts in which other words appear nearby.

This matters for generalization because genuine abstraction requires understanding what things are, not just how they co-occur. A child who understands heaviness as a property of physical objects will immediately generalize it to novel situations — a new material, a new domain, a metaphorical use. A language model will often get there too, but through different machinery, with different failure modes (Mahowald, 2024). And the failures are exactly what tell you which system actually understood the concept.

What This Means If You're Building AI

A few implications worth flagging for researchers and practitioners:

Compression as a design principle. The Schulz findings suggest that adding information-theoretic constraints to learning systems — forcing them to find simpler representations — might genuinely improve generalization, not just reduce memory usage. "Smaller models that understand more" is a legitimate research direction, not just a resource-saving compromise.

Curriculum matters. Treger and Ullman's results suggest that the order in which AI systems learn things is underexplored. Developmental psychology has decades of research on how concepts are sequentially scaffolded in children. There's almost certainly transferable wisdom there that the ML community hasn't fully mined.

Ground the data when you can. Vong et al.'s result is still remarkable: one child's embodied visual experience, 61 hours, produces generalizable word learning. For tasks where you can use grounded or first-person data rather than curated labeled datasets, the efficiency gains might be substantial.

Watch the failure modes. If you want to know whether an AI system has actually learned a concept or just a useful approximation, design tests for the cases where those two things diverge. That's where the difference between compression and memorization becomes visible — and measurable.

There's a version of this story where the punchline is "kids are amazing and AI is overrated." That's not quite what the evidence says. AI systems are genuinely impressive at many generalizations that would have seemed impossible a decade ago.

But the specific kind of generalization that children do effortlessly — extract the rule, not the pattern; find the program, not the curve — is still not what deep learning systems naturally do. The research suggests the gap is closable, but it requires deliberate design choices that go against the grain of how most systems are currently built: add constraints, follow a curriculum, ground the experience.

Your niece isn't smarter than GPT-4 across the board. But she's running a different algorithm. And for certain problems, hers is still the better one.