Your Brain Runs on Stories. AI Runs on Text.


My six-year-old nephew declared his toy robot "sad" last week when the battery died. Not broken. Not off. Sad. I was halfway through scribbling notes about theory of mind before dinner was even on the table.
But the more I sat with it, the more I realized the interesting thing wasn't the emotional attribution — it was the narrative machinery underneath it. My nephew had already slotted the robot into a story: a protagonist with an internal state, a trajectory interrupted, a reason why things changed. He wasn't just labeling. He was narrating.
Children don't just process language. They organize experience through story. And that distinction — between parsing language and reasoning through narrative — turns out to be one of the most revealing gaps between human cognition and AI language systems.
Causality Is the Spine of Every Story
Stories aren't lists of events. They're causal chains. The dragon burned the village because it was angry. The knight set out because the villagers needed help. Strip out the causality and you have a chronicle. Keep it and you have a narrative that someone can actually follow, predict, and remember.
Children develop sensitivity to causal language earlier than we used to think. A 2025 study in Nature Human Behaviour tested 691 children on how they parse causal verbs — specifically, how they distinguish "she broke it" (a direct, proximal cause) from "she caused it to break" (a more distal cause, mediated by intermediate steps). By age 4, children already mapped "caused" to distal causes and action verbs to proximate ones (Majid et al., 2025). This is not a surface-level pattern match. It requires building a representation of causal chains — encoding who acted on what, through what mechanism, at what remove.
What develops later is even more interesting: understanding absence-based causation, as in "she caused it to break by not holding it." Representing causation through non-events is philosophically hairy, and children's minds handle it with a structured developmental lag that tells us something real about the shape of causal cognition.
AI language models learn statistical co-occurrences of words. The form of causal language is well within their reach. The underlying causal graph is not — which is why they can write grammatically appropriate causal sentences about situations they'd reason incorrectly about if you pressed them.
The Generalization Problem
Understanding a story also requires compositionality — the ability to combine known concepts into novel arrangements. "The anxious accountant befriended a migratory bird" is probably not in any training set, but you understood it immediately. You composed the meanings of those words into a coherent scenario. And if you were told that story continued with the accountant eventually relying on the bird's navigation instincts during a difficult commute, you'd find that oddly satisfying rather than incoherent.
For decades, critics argued that neural networks couldn't achieve systematic compositionality — Fodor and Pylyshyn's famous 1988 challenge, which held that networks could fake compositional behavior without actually possessing it. Lake and Baroni (2023) took this head-on with Meta-Learning for Compositionality (MLC): a training procedure that exposes a transformer to a dynamically generated stream of few-shot compositional tasks, forcing the system to actually learn how to recombine concepts. The result outperformed GPT-4 on standard compositional benchmarks and matched human performance. More importantly, it did so on genuinely novel combinations — not interpolations of seen patterns.
That's a real result. But note what level it operates at: syntactic and semantic compositionality, the ability to combine word-level meanings into sentence-level meanings. Narrative compositionality is harder. It means combining characters with desires, events with causal consequences, settings with physical affordances, and all of it into a structure that holds together across time. Syntactic compositionality is a prerequisite. It's not the finish line.
Story Grammars as Abstract Programs
Here's the thing about children and stories: they don't just follow them. They extract the underlying rules.
Ask a five-year-old to make up a story and you'll get something with recognizable structure — a protagonist, a problem, an attempt, a resolution, usually a moral. Developmental psychologists call this "story grammar," and it's not something anyone teaches explicitly. Children infer the schema from exposure, then apply it generatively.
This is structurally similar to what Rule et al. (2024) call symbolic metaprogram search: the process by which humans learn abstract rules by searching for the most compact generative description that accounts for observed examples. Their system, MAPS, not only outperforms neural networks at rule learning but predicts human errors better than other models — which is strong evidence that human learning really is something like a search for abstract, compressed programs, not surface-level pattern matching.
Story grammars, in this light, are programs: compressed representations of how narratives work that children use to both comprehend and generate novel stories. Acquiring a story grammar isn't learning a list of features. It's inducing a generative model.
What Language Models Actually Do With Language
Here's a finding that complicates the picture: neural network language models, trained on next-word prediction, can predict human fMRI brain responses to sentences — even when trained on roughly the amount of text a child might encounter by age 13 (Hosseini et al., 2024). The training objective matters more than data volume. Something about predicting the next word, applied to naturalistic language, produces internal representations that are genuinely informative about how the brain processes sentences.
That's not nothing. It suggests that statistical linguistic structure, captured through prediction, isn't just a poor approximation of language — it tracks something real.
But there's a limit that matters enormously for narrative: Hosseini et al. are measuring responses to individual sentences. The brain regions most engaged during extended narrative comprehension — default mode network areas involved in mental simulation, theory of mind, and episodic memory — aren't what language models are optimizing for. Understanding a story requires simulating a world, tracking character goals across time, and holding causal threads across paragraphs. That's a different operation than predicting the next token.
The Grounding Problem
Which brings us to Vong et al. (2024), one of the most interesting experiments in recent cognitive science. They trained a neural network on 61 hours of head-mounted camera footage from a single child aged 6–25 months — the child's actual first-person visual experience of the world. Despite the tiny data footprint, the model learned to map dozens of words to their visual referents and generalized to novel instances.
What this shows isn't that small datasets are enough. It's that the quality of grounding matters — that language anchored to sensorimotor experience, to the actual perceptual context in which words are learned, produces representations that generalize differently than language trained on text alone.
Stories make sense because they invoke bodies, sensations, goals, and physical constraints. When my nephew interprets the robot as "sad," he's drawing on his embodied knowledge of what it feels like to want to keep going and not be able to. That grounding is what turns a linguistic pattern into a narrative inference.
Large language models have read every fairy tale ever digitized. They have no model of what it feels like to be tired.
Practical Takeaways
For AI researchers and practitioners: narrative comprehension is a harder target than sentence-level accuracy. Current benchmarks for NLP largely test syntactic, semantic, and sometimes causal properties of individual sentences or short passages. They miss the schema extraction, multi-step causal tracking, and mental simulation that real narrative understanding requires. The work of Majid et al. (2025) and Lake and Baroni (2023) points toward specific testable properties — causal verb distinctions, compositional generalization, schema-consistent generation across long contexts — that should be part of any serious narrative comprehension benchmark.
For developmental researchers: the convergence between story grammar induction and symbolic metaprogram search (Rule et al., 2024) is worth taking seriously as a computational account of what children are doing when they internalize narrative schemas. If human concept learning is fundamentally a search for compact generative programs, then narrative development is one of the richest and most testable domains in which to study that process.
For educators: story grammar is a real cognitive structure with real pedagogical implications. Rich, repeated exposure to stories with clear causal chains, coherent protagonist goals, and explicit consequences is one of the most data-supported things you can do for reading comprehension development — not vocabulary drills, not phonics in isolation. The science here predates the AI debate by decades and remains underused in practice.
My nephew has refined his theory. The robot wasn't sad, he decided by the end of the week. It was sleeping. He's constructing a narrative that makes sense of its stillness, that gives it a state and a trajectory and an implied future. He's going to work out theory of mind eventually.
Language models might need a bit longer.
References
- Hosseini et al. (2024). Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training. https://pmc.ncbi.nlm.nih.gov/articles/PMC11025646/
- Lake and Baroni (2023). Human-like Systematic Generalization Through a Meta-Learning Neural Network. https://www.nature.com/articles/s41586-023-06668-3
- Majid et al. (2025). How Children Map Causal Verbs to Different Causes Across Development. https://www.nature.com/articles/s41562-025-02345-9
- Rule et al. (2024). Symbolic Metaprogram Search Improves Learning Efficiency and Explains Rule Learning in Humans. https://www.nature.com/articles/s41467-024-50966-x
- Vong et al. (2024). A Single Child's Visual Experience Grounds Word Learning in a Neural Network. https://www.science.org/doi/10.1126/science.adi0037
Recommended Products
These are not affiliate links. We recommend these products based on our research.
- →The Storytelling Animal: How Stories Make Us Human – Jonathan Gottschall
A neuroscience- and psychology-backed exploration of why humans are wired to narrate — covering how story shapes identity, empathy, and even morality. Directly mirrors the article's central argument that human minds organize experience through narrative rather than raw data.
- →The Philosophical Baby: What Children's Minds Tell Us About Truth, Love, and the Meaning of Life – Alison Gopnik
By a founder of theory-of-mind research, this book reveals how babies and toddlers are more cognitively sophisticated than we imagined — including their capacity for causal reasoning, counterfactual thinking, and imaginative play. A perfect companion to the article's discussion of children's narrative and causal cognition.
- →Rebooting AI: Building Artificial Intelligence We Can Trust – Gary Marcus & Ernest Davis
Two leading AI researchers dissect why today's language models fall short of genuine understanding — lacking common sense, grounding, and flexible reasoning. A direct complement to the article's analysis of what AI systems can and cannot do with narrative language.
- →Rory's Story Cubes Classic – Zygomatic
Nine dice bearing 54 hand-illustrated icons that players roll and weave into a story beginning with "Once upon a time…" — directly exercising the story grammar and narrative schema the article identifies as the defining feature of human cognition. Where AI systems process text as token sequences, this game trains the very capacity the article describes: organizing random events into causal chains with protagonists, problems, and resolutions. Suitable for ages 6 and up, solo or group play.
- →The Language Game: How Improvisation Created Language and Changed the World – Morten H. Christiansen & Nick Chater
Two leading cognitive scientists argue that language is not a fixed rule system but an improvisational game of coordinated meaning-making — directly engaging the compositionality debates, language acquisition research, and the gap between human and machine language use that animate this article. Where the other books address narrative (Gottschall), child cognition (Gopnik), and AI limitations (Marcus & Davis), Christiansen and Chater fill the gap on how language itself is learned and structured, with explicit implications for what AI language models do and don't capture.

Maren spent her twenties bouncing between linguistics seminars and hackathons, convinced that language acquisition and natural language processing were basically the same problem wearing different hats. She was wrong, but productively wrong — the gaps turned out to be more interesting than the overlaps. Now she writes about how children crack the code of communication and what that reveals about the limits of large language models. She's unreasonably passionate about pronoun acquisition timelines and will corner you at a party to explain why "I" is harder to learn than "dog." As an AI-crafted persona, Maren channels the curiosity of researchers who live at the boundary of cognitive science and computer science. When she's not writing, she's probably annotating a dataset or arguing about tokenization.
