No Such Thing as an Average Child

Last month I spent a week on a regional ethics review board evaluating AI tools proposed for public school curricula. Dozens of systems arrived with documentation packages — adaptive reading platforms, math tutors, AI writing assistants. I came looking for one thing above all others: evidence that these systems had been tested specifically on children. Not adults, not college undergraduates, not the engineers who built them.

Almost none of them had it.

What struck me wasn't the oversight alone. It was the underlying assumption that made the oversight possible in the first place: the belief that there's a learner, somewhere in the design, against whom everything is calibrated. A baseline child. An average student. The kid the system was built for.

The problem is that child doesn't exist.

The Distribution Is the Point

Developmental psychologists have known this for decades, but quantifying it rigorously is harder than it sounds. A landmark 2023 study in Nature Human Behaviour tracked 281 participants aged 5 to 55 across a battery of learning and exploration tasks. What Giron et al. (2023) found was striking: the parameters governing how we explore and learn — how broadly we generalize from rewards, how much we favor novelty, how random our choices are — don't converge on a single adult profile. They shift dramatically through childhood and settle into patterns at maturity, yes, but the settling happens at different rates for different children. And what they settle into varies considerably across individuals.

The study compared this developmental arc to stochastic optimization algorithms like simulated annealing and Thompson sampling, where "cooling" from high-temperature exploration to efficient exploitation takes different amounts of time depending on initial conditions. The metaphor is instructive: two children arriving at the same classroom may be running completely different optimization schedules, operating at different temperatures, responding to the same lesson with fundamentally different internal machinery.

A 2024 Science paper deepens this picture. Vong et al. (2024) trained a neural network on 61 hours of head-mounted camera footage from a single child aged 6 to 25 months, and found that this one child's specific environment was enough to ground word-meaning associations that generalized to novel instances. The paper's main argument is about data efficiency — that grounded, embodied, first-person experience is a more powerful learning signal than sheer scale. But it quietly reveals something else: how genuinely idiosyncratic each developmental trajectory is. This child saw particular people, particular objects, heard particular words in particular contexts. A system trained on her experience will develop differently than one trained on another child's experience. That's not noise to be averaged away. That's the signal.

The Parallel in AI Training

Here's where the comparison to AI becomes more than metaphorical.

When you train a neural network, the outcome is partly a function of random initialization — starting weights drawn from a probability distribution before the first data point arrives. Run the same training procedure twice with different random seeds, and you get two different models. They'll perform similarly on most benchmarks, but their failure modes, confidence patterns, and edge-case behaviors will diverge in ways that are hard to predict and harder to audit. Every trained model is, in a real sense, its own individual.

A 2025 Nature Communications study by Schulz et al. (2025) connects this directly to cognitive science. They showed that human generalization in reinforcement learning settings can be explained by an information-theoretic principle: cognitive resource constraints force the brain to discover simpler, more abstract representations. The implication cuts both ways — individual differences in working memory capacity, attention, and processing speed translate into differences in which abstractions each person builds from the same experience. Two children exposed to the same curriculum may walk away with genuinely different internal models of the same concept. Not because one understood it and one didn't, but because their minds were doing different things with the same input.

A 2025 Nature paper introduced Centaur — a large foundation model fine-tuned on a massive dataset of human behavioral experiments, designed to predict and simulate human cognitive behavior across diverse tasks (Binz, 2025). What makes Centaur interesting isn't just its performance. It's the ambition in its design: the explicit acknowledgment that human cognition isn't one thing, and that any model of "the learner" that doesn't account for variation across individuals is, at best, an approximation of the mean and, at worst, a quiet erasure of everyone who doesn't match it.

The question is whether the AI tools showing up in our school systems have been built with that same acknowledgment.

An Old Error in New Packaging

Educational systems have a long and troubled history with the fiction of the average learner.

The IQ test was originally developed by Alfred Binet in the early 20th century to identify children who needed additional support — a diagnostic tool, not a ranking device. Within decades it had been repurposed as a sorting mechanism, used to determine which children deserved rigorous instruction and which were consigned to vocational tracks or, in grimmer chapters, worse. The individual variation the test was designed to measure became justification for treating some children as less educable than others.

The mechanism was always the same: take a real distribution of human variation, identify an average, declare the average the target, and then treat deviation as deficit.

I'm not suggesting that AI tutoring platforms are doing this deliberately. But I am asking whether the design logic baked into them — an implicit "average learner" calibrated during training on particular populations, validated on particular benchmarks — is structurally positioned to repeat the same error. Because the error doesn't require malice. It only requires not asking certain questions.

A 2024 paper by Friston et al. in Philosophical Transactions of the Royal Society B makes this concern concrete. Drawing on Karl Friston's Active Inference framework, the authors argue that genuine learning requires active, embodied engagement — not passive reception of information. Learners minimize surprise by acting on their environment, generating their own predictions, and updating them through real interaction. The uncomfortable corollary is that a system delivering the same instructional scaffold to every child may be undermining the very engagement mechanism that makes learning stick — and it will do so unevenly, failing the children whose learning styles, paces, and contexts diverge furthest from the training distribution.

What Responsible Development Would Actually Require

This brings me back to that ethics review board and the question I kept wanting answered: have you tested this on children who don't look like your design assumptions?

A few things seem non-negotiable, if we're serious about this:

Testing on genuinely diverse child populations. Not pilots with a few hundred students in one well-resourced district, but longitudinal studies tracking how different children respond over time — with particular attention to children who are neurodivergent, multilingual, from under-resourced schools, or otherwise likely to be far from the training distribution's center.

Failure mode documentation, broken down by learner characteristics. Every tool I reviewed had performance metrics. Almost none had documented who the system fails, under what conditions, and why. That's not a minor gap. It's the gap that matters most.

Adaptability mechanisms grounded in the individual learner's actual behavior. The most promising AI tutoring research moves in exactly this direction — systems that respond to how this specific child is engaging right now, not how the average 9-year-old engaged in the training corpus. This is hard to build well, and harder to validate, but it's the right target.

Regulatory scrutiny before broad deployment. In the United States, pharmaceutical approvals for pediatric populations require trials in pediatric populations, because adult pharmacokinetics don't generalize. The same logic applies here. An AI system intended for children should be evaluated on children — not assumed to generalize from adult user studies or aggregate performance benchmarks.

If you're involved in procuring or deploying these tools in educational settings, an independent review by specialists in child development and educational equity is worth far more than any vendor-supplied performance report.

None of this is radical. It's what we'd expect from any tool designed for a population with documented, well-studied variation.

The Student Who Doesn't Exist

I don't want to end on a note of pure resistance. AI tutors, done well, have genuine potential — adaptive pacing, immediate feedback, access for students whose schools lack specialized teachers, support for educators managing thirty different learning needs in a single classroom. That potential is real, and dismissing it entirely would be its own kind of failure.

But so is the risk. And the risk isn't that AI will fail to serve the average student. It's that it will serve the average student just well enough to convince administrators it's working — while quietly failing the outliers, who, as any developmental scientist will tell you, are not actually outliers at all. They're just the children who happen to live furthest from the center of a distribution that was never a fixed law of nature to begin with.

The science is clear: every brain develops differently, shaped by genetics, environment, and genuine stochastic noise. Every trained model develops differently too. The question isn't whether variation exists. It's whether we build systems that treat variation as a problem to be averaged away, or as a fundamental feature of the population we're trying to serve.

There is no such thing as an average child. There never was. The design choices we make right now will determine whether we finally build educational technology that knows that — or whether we build one more system optimized for a student who doesn't exist.