Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain

by   Mohsen Jamali, et al.

With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain.


Do Large Language Models know what humans know?

Humans can attribute mental states to others, a capacity known as Theory...

Large Language Models Converge on Brain-Like Word Representations

One of the greatest puzzles of all time is how understanding arises from...

Theory of Mind May Have Spontaneously Emerged in Large Language Models

Theory of mind (ToM), or the ability to impute unobservable mental state...

Human Emotion Knowledge Representation Emerges in Large Language Model and Supports Discrete Emotion Inference

How humans infer discrete emotions is a fundamental research question in...

ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind

Theory of Mind (ToM), the capacity to comprehend the mental states of di...

MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation

Humans talk in free-form while negotiating the expressed meanings or com...

Using Artificial Populations to Study Psychological Phenomena in Neural Models

The recent proliferation of research into transformer based natural lang...

Please sign up or login with your details

Forgot password? Click here to reset