Abstract: Noam Chomsky’s idea of linguistic competence suggests a new approach to understanding how LLMs work. This approach requires careful analysis of text. Such analysis indicates that ChatGPT has explicit control over sophisticated discourse skills: 1) It possesses the capacity to specify high-level structures that regulate the organization of language strings into specific patterns: e.g. conversational turn-taking, story frames, film interpretation, and metalingual definition of abstract concepts. 2) It is capable of analogical reasoning in the interpretation of films and stories, such as Spielberg’s Jaws and A.I., and Tezuka’s Astro Boy stories. It must establish an analogy between some abstract interpretive theory (e.g. the ideas of Rene Girard) and people and events in a story. 3) It has some understanding of abstract concepts such as justice and charity. Such concepts can be defined over concepts that exhibit them (metalingual definition). ChatGPT recognizes suitable stories and can revise them. 4) ChatGPT can adjust its level of discourse to accommodate children of various ages. Finally, much of ChatGPT’s discourse seems formulaic in a way similar to what Parry/Lord found in oral epic.
Contents
Introduction: Walking among dragons 2 What is in the rest of this document? 6 Calibration: Understanding a Seinfeld Bit 9 Conversing with ChatGPT about Jaws, Mimetic Desire, and Sacrifice 12 ChatGPT on Spielberg’s A.I. Artificial Intelligence and AI Alignment 23 Extra! Extra! In a discussion about Astro Boy, ChatGPT defends the rights of robots and advanced AI 27 Pumpkins, the Falcon Heavy, and Groucho Marx: High level discourse structure in ChatGPT 30 High level discourse structure in ChatGPT: Part 2 [Quasi-symbolic?] 37 Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity? 42 Does ChatGPT’s performance warrant working on a tutor for children? 53 To the future and beyond 59 Coda: What’s going to be in Part 2: A Framework for Description and Analysis? 65 Appendix: ChatGPT gets confused about Sonnet 129 68
Introduction: Walking among dragons
“Your brain, your job, and your most fundamental beliefs will be challenged by AI like nothing ever before. Make sure you understand it, how it works, and where and how it is being used.” – David Ferrucci
When I first heard about ChatGPT on November 30, 2022, I figured I’d pass on it. After all, it was ultimately based on GPT-3 and I’d already had a little bit of fun with that, albeit through an intermediary. What more could there be? The next day, however, I thought, Why not, it’s free, no? I signed up for an account. I had no particular intentions. I just wanted to test the water.
Total Immersion
I’ve been swimming in it since then. I’ve copied every “conversation” into a text document that is 178 pages long. I can’t tell you how many times I’ve laughed out loud and danced in my seat in reaction to ChatGPT’s response to a prompt.
ChatGPT is more fun than a barrel of monkeys. But it is also work. When I started playing with it I had no specific intentions; I certainly did not intend to write about ChatGPT extensively. I just wanted to poke around. I became, if not hooked, perhaps entranced. I began systematically exploring it, not to find its faults, its weakness, as many are doing, but to test its strengths.
In the process I have changed, though it is difficult to characterize that change. I wouldn’t say that it has added given me any new indeas. Nor has it changed my views on the strengths and weakness of so-called deep learning (DL) technology. It is not an INTELLECTUAL change.
Like many others I have believed DL needs to be augmented by “classical” symbolic processing. I still believe that. I also believed that DL systems need direct interaction with the world if they are to exhibit real “intelligence.” That belief remains rock-solid. Some enthusiats have been saying that DL will take us all the way to full Artificial General Intelligence (AGI) simply by scaling up: more parameters, more data, and more compute. I disagree. Deeper changes are required.
What has changed is my ORIENTATION, my outlook. I have had a glimpse, however limited and provisional, into a new world, a world where we will be working with these “miracles of rare device” (to borrow a phrase from Coleridge) in ways we had not previously imagined. If you want to know what it’s like to drive a car, you can only do it from the driver’s seat. I have taken the driver’s seat and have been systematically exploring ChatGPT.
It is one thing to be amazed by this or that output from ChatGPT. There’s Lots of that going around. I am going beyond that to analyze some of the mechanims of discursive competence, to borrow a term from Noam Chomsky, that enable ChatGPT to function so well. This working paper is a preliminary report on these explorations. I believe that through the careful analysis of ChatGPT’s discursive output we can gain insight into its inner operations, allowing us to improve future technology and to develop benchmarks more tailored to the capacities of emerging LLM technology.
I responded to GPT-3 with a report entitled, GPT-3: Waterloo or Rubicon? Here be Dragons. I have crossed the Rubicon and have been walking among dragons. It is time we get to know them better, to talk with them.
To learn about dragons, describe and analyze them
When I started playing with ChatGPT on December 1, 2022, I had no specific intentions. I wanted to poke around, see what I could see, and then...As I said, I had no specific intentions. I certainly did not intend to spend hours interacting with it to produce a Microsoft Word document currently (1.5.22) containing 61,580 words of transcription – the vast majority from ChatGPT – on 178 pages.
One of the earliest things I did with ChatGPT – not THE first, it was my third session, on December 1, 2022 ¬– was to dialog about Steven Spielberg’s Jaws and the ideas of Rene Girard. I took that and wrote it up for 3 Quarks Daily. Then I had some fun with “Kubla Khan,” quizzed it about trumpets, had a long session about Gojira/Godzilla, and then returned to Spielberg, this time to A.I. Artificial Intelligence. By this time I was developing a feel for how ChatGPT responded. Both the Jaws and the A.I. posts are included in this paper.
I became more systematic, looking for specific things, testing them out. That led to a post with a rather baroque title, “Of pumpkins, the Falcon Heavy, and Groucho Marx: High level discourse structure in ChatGPT,” which I’ve also included in this paper. In that post I advanced the argument that there are parameters in the language model that govern ligher level discourse structures independently of the specific words and strings that realize them.
The alternation pattern is something like this:
A, B, A, B....
That can be repeated as often as one will. The text in the A sections is always drawn from one body of material while the text in the B sections is drawn from a different body of material. That’s the pattern ChatGPT has learned. Where is it in the net? How’s it encoded.
The frame structure is a bit more complicated:
A (B, C, B, C....) A’
The embedded alternation draws on two bodies of material, any two bodies. The second part of the frame, A’, must complement the first, A.
Again, it’s not a complex structure. But it’s not defined directly over particular words. It’s defined over groups of words, placing the groups, not the individual words, into specified relationships in the discourse string.
I then suggested that the patterns I had identified in Jaws and A.I. where similar, but, if anything, more complex.
I had become all but convinced that ChatGPT had explicit control over high-level discourse properties. When humans make statements like those, we take it as obvious that they have some “grammar” of high-level discourse structures. Narratologists, linguists, and psycholinguists study them. But ChatGPT is not a human. It is, shall we say, a machine, a machine that was trained to guess the next word, word after word after word....and so forth, for jillions of texts. All that’s in the resulting model is statistics about those texts. It’s seems to be a “stochastic parrot”, as one well-know paper argued.
Perhaps, in a sense, that is a true. But that is a terribly reductive characterization, and, I have come to believe, all but beside the point. Large language models issue one word at a time for the same reason that humans do: That’s the nature of the communication channel, and tells us relatively little about the device that is pushing words through the channel. LLMs develop rich and complicated structures of parameter weights during the training process. Yes, those structures are statistical in nature, but they are also structures. Perhaps there are aspects of those structures that we can investigate without having to “open the hood” and examine parameter weights.
I made that suggestion in a post, “Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity?”, also included in this paper. Chomsky famously distinguished between competence and performance, where the study of linguistic performance is about the mechanism that produces and understands texts while the study of linguistic competence is about the structure of the texts independent of underlying mechanisms. When I analyze ChatGPT’s output I am investigating its competence. When researchers pop the hood and examine parameter weights, they are investigating performance mechanisms. I further suggest that a better understanding of an LLM’s competence will aid in studying those performance mechanisms by giving us clues about what they are doing.
Nor am I the only one who believes in the value of studying the output of these engines. Others have come to that conclusion as well, though perhaps not quite in those terms. Here is the abstract of a recent preprint from Marcel Binz and Eric Schulz from the Max Planck Institute:
We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3’s decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3’s behavior is impressive: it solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multi-armed bandit task, and shows signatures of model-based reinforcement learning. Yet we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. These results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.[1]
My methods are different, but my purpose is the same, “to study increasingly capable and opaque artificial agents” and thus to render them less opaque. The insights we gain thereby will aid us to improve the capabilities of the next generation of artificial agents.
* * * * *
[1] Marcel Binz and Eric Schulz, 2022. “Using Cognitive Psychology to Understand GPT-3,” PsyArXiv, June 21, 2022. doi:10.31234/osf.io/6dfgk.
Discursive Competence in ChatGPT, Part 1: Talking with Dragons
Version 1, January 5, 2022
Title above, URLs, abstract, contents, and introduction below:
Academia.edu: https://www.academia.edu/94409729/Discursive_Competence_in_ChatGPT_Part_1_Talking_with_Dragons
SSRN: https://ssrn.com/abstract=4318832
Research Gate: https://www.researchgate.net/publication/
366897197_Discursive_Competence_in_ChatGPT_Part_1_Talking_with_Dragons_Discursive_Competence_in_ChatGPT_Part_1_Talking_with_Dragons
Contents
Introduction: Walking among dragons 2
What is in the rest of this document? 6
Calibration: Understanding a Seinfeld Bit 9
Conversing with ChatGPT about Jaws, Mimetic Desire, and Sacrifice 12
ChatGPT on Spielberg’s A.I. Artificial Intelligence and AI Alignment 23
Extra! Extra! In a discussion about Astro Boy, ChatGPT defends the rights of robots and advanced AI 27
Pumpkins, the Falcon Heavy, and Groucho Marx: High level discourse structure in ChatGPT 30
High level discourse structure in ChatGPT: Part 2 [Quasi-symbolic?] 37
Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity? 42
Does ChatGPT’s performance warrant working on a tutor for children? 53
To the future and beyond 59
Coda: What’s going to be in Part 2: A Framework for Description and Analysis? 65 Appendix: ChatGPT gets confused about Sonnet 129 68
Introduction: Walking among dragons
When I first heard about ChatGPT on November 30, 2022, I figured I’d pass on it. After all, it was ultimately based on GPT-3 and I’d already had a little bit of fun with that, albeit through an intermediary. What more could there be? The next day, however, I thought, Why not, it’s free, no? I signed up for an account. I had no particular intentions. I just wanted to test the water.
Total Immersion
I’ve been swimming in it since then. I’ve copied every “conversation” into a text document that is 178 pages long. I can’t tell you how many times I’ve laughed out loud and danced in my seat in reaction to ChatGPT’s response to a prompt.
ChatGPT is more fun than a barrel of monkeys. But it is also work. When I started playing with it I had no specific intentions; I certainly did not intend to write about ChatGPT extensively. I just wanted to poke around. I became, if not hooked, perhaps entranced. I began systematically exploring it, not to find its faults, its weakness, as many are doing, but to test its strengths.
In the process I have changed, though it is difficult to characterize that change. I wouldn’t say that it has added given me any new indeas. Nor has it changed my views on the strengths and weakness of so-called deep learning (DL) technology. It is not an INTELLECTUAL change.
Like many others I have believed DL needs to be augmented by “classical” symbolic processing. I still believe that. I also believed that DL systems need direct interaction with the world if they are to exhibit real “intelligence.” That belief remains rock-solid. Some enthusiats have been saying that DL will take us all the way to full Artificial General Intelligence (AGI) simply by scaling up: more parameters, more data, and more compute. I disagree. Deeper changes are required.
What has changed is my ORIENTATION, my outlook. I have had a glimpse, however limited and provisional, into a new world, a world where we will be working with these “miracles of rare device” (to borrow a phrase from Coleridge) in ways we had not previously imagined. If you want to know what it’s like to drive a car, you can only do it from the driver’s seat. I have taken the driver’s seat and have been systematically exploring ChatGPT.
It is one thing to be amazed by this or that output from ChatGPT. There’s Lots of that going around. I am going beyond that to analyze some of the mechanims of discursive competence, to borrow a term from Noam Chomsky, that enable ChatGPT to function so well. This working paper is a preliminary report on these explorations. I believe that through the careful analysis of ChatGPT’s discursive output we can gain insight into its inner operations, allowing us to improve future technology and to develop benchmarks more tailored to the capacities of emerging LLM technology.
I responded to GPT-3 with a report entitled, GPT-3: Waterloo or Rubicon? Here be Dragons. I have crossed the Rubicon and have been walking among dragons. It is time we get to know them better, to talk with them.
To learn about dragons, describe and analyze them
When I started playing with ChatGPT on December 1, 2022, I had no specific intentions. I wanted to poke around, see what I could see, and then...As I said, I had no specific intentions. I certainly did not intend to spend hours interacting with it to produce a Microsoft Word document currently (1.5.22) containing 61,580 words of transcription – the vast majority from ChatGPT – on 178 pages.
One of the earliest things I did with ChatGPT – not THE first, it was my third session, on December 1, 2022 ¬– was to dialog about Steven Spielberg’s Jaws and the ideas of Rene Girard. I took that and wrote it up for 3 Quarks Daily. Then I had some fun with “Kubla Khan,” quizzed it about trumpets, had a long session about Gojira/Godzilla, and then returned to Spielberg, this time to A.I. Artificial Intelligence. By this time I was developing a feel for how ChatGPT responded. Both the Jaws and the A.I. posts are included in this paper.
I became more systematic, looking for specific things, testing them out. That led to a post with a rather baroque title, “Of pumpkins, the Falcon Heavy, and Groucho Marx: High level discourse structure in ChatGPT,” which I’ve also included in this paper. In that post I advanced the argument that there are parameters in the language model that govern ligher level discourse structures independently of the specific words and strings that realize them.
I then suggested that the patterns I had identified in Jaws and A.I. where similar, but, if anything, more complex.
I had become all but convinced that ChatGPT had explicit control over high-level discourse properties. When humans make statements like those, we take it as obvious that they have some “grammar” of high-level discourse structures. Narratologists, linguists, and psycholinguists study them. But ChatGPT is not a human. It is, shall we say, a machine, a machine that was trained to guess the next word, word after word after word....and so forth, for jillions of texts. All that’s in the resulting model is statistics about those texts. It’s seems to be a “stochastic parrot”, as one well-know paper argued.
Perhaps, in a sense, that is a true. But that is a terribly reductive characterization, and, I have come to believe, all but beside the point. Large language models issue one word at a time for the same reason that humans do: That’s the nature of the communication channel, and tells us relatively little about the device that is pushing words through the channel. LLMs develop rich and complicated structures of parameter weights during the training process. Yes, those structures are statistical in nature, but they are also structures. Perhaps there are aspects of those structures that we can investigate without having to “open the hood” and examine parameter weights.
I made that suggestion in a post, “Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity?”, also included in this paper. Chomsky famously distinguished between competence and performance, where the study of linguistic performance is about the mechanism that produces and understands texts while the study of linguistic competence is about the structure of the texts independent of underlying mechanisms. When I analyze ChatGPT’s output I am investigating its competence. When researchers pop the hood and examine parameter weights, they are investigating performance mechanisms. I further suggest that a better understanding of an LLM’s competence will aid in studying those performance mechanisms by giving us clues about what they are doing.
Nor am I the only one who believes in the value of studying the output of these engines. Others have come to that conclusion as well, though perhaps not quite in those terms. Here is the abstract of a recent preprint from Marcel Binz and Eric Schulz from the Max Planck Institute:
My methods are different, but my purpose is the same, “to study increasingly capable and opaque artificial agents” and thus to render them less opaque. The insights we gain thereby will aid us to improve the capabilities of the next generation of artificial agents.
* * * * *
[1] Marcel Binz and Eric Schulz, 2022. “Using Cognitive Psychology to Understand GPT-3,” PsyArXiv, June 21, 2022. doi:10.31234/osf.io/6dfgk.