Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

arXiv:2606.28524v1 Announce Type: new Abstract: Recent work suggests that Large Language Models (LLMs) are sensitive to the belief states of agents described by text, as measured by the false belief task (FBT), yet persistent concerns of construct validity remain. We adopt a **developmental perspective**, tracing the pattern of mental state reasoning behavior -- and likely **preconditions** for this behavior -- across multiple training stages in the Olmo2 and Pythia language model suites. We find that above-chance FBT performance depends both on model size and sufficient training volume, emerg
The continuous scaling of large language models makes the investigation into their emergent cognitive abilities, like mentalizing, a pressing research area.
Understanding the developmental trajectories of AI cognitive capacities helps predict future AI capabilities and guide their safe and effective deployment.
This research provides deeper insight into the conditions (model size, training volume) required for advanced AI reasoning, rather than simply observing it post-hoc.
- · AI researchers
- · AI developers
- · Cognitive science
- · Simplistic AI safety approaches
- · AI models lacking sufficient scale
It becomes clearer when and why certain advanced cognitive abilities, like understanding belief states, emerge in large language models.
This understanding could lead to more targeted and efficient development of AI systems capable of sophisticated social and psychological reasoning.
Improved AI mentalizing could enable more robust human-AI collaboration and more effective agents in complex, multi-agent environments.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL