Conceptual coherence for concrete categories in humans and LLMs
Cross-posted from New Savanna.
Siddharth Suresh, Kushin Mukherjee, Xizheng Yu, Wei-Chun Huang, Lisa Padua, and Timothy T. Rogers, Conceptual structure coheres in human cognition but not in large language models, arXiv:2304.02754v2 [cs.AI] 10 Nov 2023.
Abstract: Neural network models of language have long been used as a tool for developing hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. Contemporary large language models (LLMs), however, make it possible to interrogate the latent structure of conceptual representations using experimental methods nearly identical to those commonly used with human participants. The current work utilizes three common techniques borrowed from cognitive psychology to estimate and compare the structure of concepts in humans and a suite of LLMs. In humans, we show that conceptual structure is robust to differences in culture, language, and method of estimation. Structures estimated from LLM behavior, while individually fairly consistent with those estimated from human behavior, vary much more depending upon the particular task used to generate responses– across tasks, estimates of conceptual structure from the very same model cohere less with one another than do human structure estimates. These results highlight an important difference between contemporary LLMs and human cognition, with implications for understanding some fundamental limitations of contemporary machine language.
What the abstract doesn’t tell you is that the categories under investigation are for concrete objects and not abstract: “The items were drawn from two broad categories– tools and reptiles/amphibians–selected because they span the living/nonliving divide and also possess internal conceptual structure.” Why does this matter? Because the meaning of concrete items is grounded in sensorimotor schemas whie the meaning of abstract is not.
In their conclusion, the authors point out:
Together these results suggest an important difference between human cognition and current LLM models. Neuro-computational models of human semantic memory suggest that behavior across many different tasks is undergirded by a common conceptual “core” that is relatively insulated from variations arising from different contexts or tasks (Rogers et al., 2004; Jackson et al., 2021). In contrast, representations of word meanings in large language models depend essentially upon the broader linguistic context. Indeed, in transformer architectures like GPT- 3, each word vector is computed as a weighted average of vectors from surrounding text, so it is unclear whether any word possesses meaning outside or independent of context.
For humans, the sensorimotor grounding of concrete concepts provides that conceptual core, which is necessarily lacking for LLMs, which do not have access to the physical world. Context is all they’ve got, and so their sense of meanings for words will necessarily be drawn to the context. The authors acknowledge this point at the end:
Finally, Human semantic knowledge is the product of several sources of information including visual, tactile, and auditory properties of the concept. While LLMs can implicitly acquire knowledge about these modalities via the corpora they are trained on, they are nevertheless bereft of much of the knowledge that humans are exposed to that might help them organize concepts into a more coherent structure. In this view, difference in the degree in conceptual coherence between LLMs and humans should not be surprising.
Adding link to the paper: https://arxiv.org/pdf/2304.02754.pdf