The recent advent of large language models—large neural networks trained on a simple predictive objective over a massive corpus of natural language—has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training on those problems. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (GPT-3) on a range of analogical tasks, including a novel text-based matrix reasoning task closely modeled on Raven’s Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
GPT-4
In one type of analogical reasoning where GPT-3 still fared poorer than humans, story analogies, GPT-4 significantly improved. In the lecture about this paper at Santa Fe Institute, Taylor Webb shared the results of GPT-4 testing:
Taylor: “I was most astounded by that GPT-4 often produces very precise explanations of why one of the answers is not a very good answer. […] all of the same things happen [in the stories], and then [GPT-4] would say, ‘The difference is, in this case, this was caused by this, and in that case, it wasn’t caused by that.’ Very precise explanations of the analogies.”
Emergent Analogical Reasoning in Large Language Models
Link post
Taylor Webb, Keith J. Holyoak, Hongjing Lu, December 2022
GPT-4
In one type of analogical reasoning where GPT-3 still fared poorer than humans, story analogies, GPT-4 significantly improved. In the lecture about this paper at Santa Fe Institute, Taylor Webb shared the results of GPT-4 testing:
Taylor: “I was most astounded by that GPT-4 often produces very precise explanations of why one of the answers is not a very good answer. […] all of the same things happen [in the stories], and then [GPT-4] would say, ‘The difference is, in this case, this was caused by this, and in that case, it wasn’t caused by that.’ Very precise explanations of the analogies.”
I also recommend listening to the Q&A session after the lecture.