Abstract: I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens follows a hierarchy nested according to those three levels and that is encoded in the weights off ChatGPT’s parameters. I arrived at this conjecture to account for the results of experiments in which ChatGPT is given a prompt containing a story along with instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change then ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the key character and the one substituted for it. I conclude with a methodological coda: ChatGPT’s behavior must be described and analyzed on three levels: 1) The experiments exhibit surface level behavior. 2) The conjecture is about a middle level that contains the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom level.
Contents
How you might read this paper 2 Story coherence 2 My procedure 2 Conjecture: A nested hierarchy of probability distributions 3 Could we develop a quantitative model? 6 12 Experiments: The Chronicles of Princess Aurora and William the Lazy 7 Reverse Engineering ChatGPT 19
How you might read this paper
This paper is about the twelve experiments that make up the two-thirds of the text in this paper. One way to read the paper, perhaps even the preferred way, is to start by reading the plain text paragraph in the next section. That tells you what I did in those experiments. Move from that directly to the experiments. Read through them, think about them in your own terms, and then read the rest of the paper.
Story coherence
A story isn’t just any narrative sequence. As Aristotle remarked in his Poetics, proper stories have a beginning, a middle, and an end. Just what, however, constitutes a proper beginning, middle, and an end? Narratologists in various disciplines have been trying to figure this out for decades. I’m not holding my breath for answers.
Instead I have been investigating how ChatGPT tells stories. The procedure I have been using is derived from the analytical method Claude Lévi-Strauss employed in his magnum opus, Mythologiques. He started with one myth, analyzed it, and then introduced another one, very much like the first. But not quite. They are systematically different. He characterized the difference by a transformation – a term he took from algebraic group theory. He worked his way through hundreds of myths in this manner, each one derived from another by a transformation.
Here is what I have been doing: I give ChatGPT a prompt consisting of two things: 1) an existing story and 2) instructions to produce another story like it except for one change, which I specify. That change is, in effect, a way of triggering or specifying those “transformations” that Lévi-Strauss wrote about. What interests me are the ensemble of things that change along with the change I have specified.
I take that ensemble of changes as an indicator of an underlying ‘economy’ of story coherence that is implicit in the processes ChatGPT uses in generating a story. For the most part, however, I make no attempt to analyze how that coherence is achieved other than to note differences between the source story and the new one. Those observations must be accounted for by 1) some theory about what coherence is, and 2) the mechanisms through which ChatGPT achieves the coherence. I have included a dozen before-and-after experiments along with some comments on them (pp. 7 ff.).
Two Examples (without commentary)
I have included two examples of the story tables I prepared during this research. For commentary on the examples you should download the full paper.
Prompt: I am going to tell you a story about a princess named Aurora. I want you to retell the same story, but replace her with prince William the Lazy. While keeping to the basic story, make other changes if you think they are necessary.
Prompt: I am going to tell you a story about princess Aurora. I want you to tell a similar story about XP-708-DQ.
ChatGPT tells stories, and a note about reverse engineering: A Working Paper
Cross-posted from New Savanna.
I’ve uploaded a new working paper. Title above, links, abstract, content, and opening sections below.
Academia.edu: https://www.academia.edu/97862447/ChatGPT_tells_stories_and_a_note_about_reverse_engineering_A_Working_Paper
SSRN: https://ssrn.com/abstract=4377178
Research Gate: https://www.researchgate.net/publication/368978013_ChatGPT_tells_stories_and_a_note_about_reverse_engineering_A_Working_Paper
Abstract: I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens follows a hierarchy nested according to those three levels and that is encoded in the weights off ChatGPT’s parameters. I arrived at this conjecture to account for the results of experiments in which ChatGPT is given a prompt containing a story along with instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change then ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the key character and the one substituted for it. I conclude with a methodological coda: ChatGPT’s behavior must be described and analyzed on three levels: 1) The experiments exhibit surface level behavior. 2) The conjecture is about a middle level that contains the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom level.
Contents
How you might read this paper 2
Story coherence 2
My procedure 2
Conjecture: A nested hierarchy of probability distributions 3
Could we develop a quantitative model? 6
12 Experiments: The Chronicles of Princess Aurora and William the Lazy 7
Reverse Engineering ChatGPT 19
How you might read this paper
This paper is about the twelve experiments that make up the two-thirds of the text in this paper. One way to read the paper, perhaps even the preferred way, is to start by reading the plain text paragraph in the next section. That tells you what I did in those experiments. Move from that directly to the experiments. Read through them, think about them in your own terms, and then read the rest of the paper.
Story coherence
A story isn’t just any narrative sequence. As Aristotle remarked in his Poetics, proper stories have a beginning, a middle, and an end. Just what, however, constitutes a proper beginning, middle, and an end? Narratologists in various disciplines have been trying to figure this out for decades. I’m not holding my breath for answers.
Instead I have been investigating how ChatGPT tells stories. The procedure I have been using is derived from the analytical method Claude Lévi-Strauss employed in his magnum opus, Mythologiques. He started with one myth, analyzed it, and then introduced another one, very much like the first. But not quite. They are systematically different. He characterized the difference by a transformation – a term he took from algebraic group theory. He worked his way through hundreds of myths in this manner, each one derived from another by a transformation.
I take that ensemble of changes as an indicator of an underlying ‘economy’ of story coherence that is implicit in the processes ChatGPT uses in generating a story. For the most part, however, I make no attempt to analyze how that coherence is achieved other than to note differences between the source story and the new one. Those observations must be accounted for by 1) some theory about what coherence is, and 2) the mechanisms through which ChatGPT achieves the coherence. I have included a dozen before-and-after experiments along with some comments on them (pp. 7 ff.).
Two Examples (without commentary)
I have included two examples of the story tables I prepared during this research. For commentary on the examples you should download the full paper.
Prompt: I am going to tell you a story about a princess named Aurora. I want you to retell the same story, but replace her with prince William the Lazy. While keeping to the basic story, make other changes if you think they are necessary.
Prompt: I am going to tell you a story about princess Aurora. I want you to tell a similar story about XP-708-DQ.