What must be the case that ChatGPT would have memorized “To be or not to be”? – Three kinds of conceptual objects for LLMs

This is cross-posted from my personal blog, New Savanna.

Is anyone working on this kind of issue? If so, I’d like to know.


I’ve been thinking about ChatGPT again, about what goes on under the hood. What would happen I prompted it with a famous phrase from Shakespeare? Would it return the whole passage?

While I am interested in that question, that’s not only thing I’m interested in. There are topics which show up in responses to prompts but not in the form of fixed texts. Finally, there are likely topics that exist in the training corpus, but don’t show up in response to prompts. THAT’s what this post is about, the differences between those kind of objects.

Fixed strings – Hamlet’s soliloquy

It’s easy enough to try. I did so on July 31, running against the July 20 version.

To be or not to be

“To be or not to be” is one of the most famous lines from William Shakespeare’s play “Hamlet.” It appears in Act III, Scene 1, where Prince Hamlet is contemplating the nature of existence and the idea of suicide.

The full quote is as follows:

“To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing, end them? To die: to sleep;
No more; and by a sleep to say we end
The heartache and the thousand natural shocks
That flesh is heir to; ’tis a consummation
Devoutly to be wished. To die, to sleep;
To sleep, perchance to dream—ay, there’s the rub:
For in that sleep of death, what dreams may come,
When we have shuffled off this mortal coil,
Must give us pause—there’s the respect
That makes calamity of so long life:
For who would bear the whips and scorns of time,
Th’oppressor’s wrong, the proud man’s contumely,
The pangs of despised love, the law’s delay,
The insolence of office, and the spurns
That patient merit of th’unworthy takes,
When he himself might his quietus make
With a bare bodkin? Who would fardels bear,
To grunt and sweat under a weary life,
But that the dread of something after death,
The undiscovered country from whose bourn
No traveler returns, puzzles the will,
And makes us rather bear those ills we have,
Than fly to others that we know not of?
Thus conscience does make cowards of us all,
And thus the native hue of resolution
Is sicklied o’er with the pale cast of thought,
And enterprises of great pith and moment,
With this regard their currents turn awry,
And lose the name of action. Soft you now,
The fair Ophelia! Nymph, in thy orisons
Be all my sins remembered.”

In this soliloquy, Hamlet is contemplating the nature of life, death, and the fear of the unknown. He ponders whether it is nobler to endure the hardships and challenges of life or to take his own life and end the suffering. Throughout the play, Hamlet grapples with existential questions and struggles with his emotions, making “To be or not to be” one of the most memorable and profound passages in literature.

Except for the opening and closing material, that’s pretty much what I expected to happen. But why? Given that the underlying LLM has been trained to predict the next word, what MUST have been the case in order to ChatGPT to return the whole soliloquy when given the opening six words?

It must have encountered that soliloquy many times during its training. Shakespeare’s Hamlet is a well-known play and it must have existed at many places in the training corpus. Moreover the soliloquy itself is famous probably occurred many times independently of the whole play.

What would happen if I gave it a phrase from somewhere in the soliloquy? Let’s find out. From 12 minutes ago, running against the August 4 version. This is of course a new session; there is no prior context it is working from:

The insolence of office

The phrase “the insolence of office” is a line from William Shakespeare’s play “Hamlet.” It appears in Act 3, Scene 1, during a soliloquy by the character Hamlet. The full line is:

“To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing, end them? To die: to sleep;
No more; and by a sleep to say we end
The heartache and the thousand natural shocks
That flesh is heir to; ’tis a consummation
Devoutly to be wished. To die, to sleep;
To sleep, perchance to dream—ay, there’s the rub:
For in that sleep of death, what dreams may come,
When we have shuffled off this mortal coil,
Must give us pause—there’s the respect
That makes calamity of so long life:
For who would bear the whips and scorns of time,
Th’oppressor’s wrong, the proud man’s contumely,
The pangs of despised love, the law’s delay,
The insolence of office, and the spurns
That patient merit of th’unworthy takes,
When he himself might his quietus make
With a bare bodkin? Who would fardels bear,
To grunt and sweat under a weary life,
But that the dread of something after death,
The undiscovered country from whose bourn
No traveler returns, puzzles the will,
And makes us rather bear those ills we have,
Than fly to others that we know not of?
Thus conscience does make cowards of us all,
And thus the native hue of resolution
Is sicklied o’er with the pale cast of thought,
And enterprises of great pith and moment,
With this regard their currents turn awry,
And lose the name of action.

In this famous soliloquy, Hamlet is contemplating the nature of existence, the pain and suffering that people endure in life, and the fear of the unknown in death. The specific line you mentioned, “The insolence of office,” refers to the abuses of power and authority that people in positions of power may engage in, which can be a source of suffering and frustration for those subjected to it. It’s just one part of Hamlet’s larger meditation on the human condition and the difficult choices we face in life.

I wasn’t sure that would happen, but I’m not very surprised that it did. Note, though, that whereas it gave 35 lines the previous time, it only gave 33 lines this time, omitting: “Soft you now,/​ The fair Ophelia! Nymph, in thy orisons/​ Be all my sins remembered.” Not surprisingly, the final gloss was a bit different as it decided to explicate the line I’d given it as a prompt.

I decided to try again. I opened a new session and gave it the well-known phrase that opens the fourth line:

The slings and arrows

The phrase “the slings and arrows” is a poetic expression that originates from one of William Shakespeare’s most famous soliloquies in his play “Hamlet.” In Act 3, Scene 1, Hamlet delivers a soliloquy that begins with the famous lines:

“To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing, end them?

In this soliloquy, Hamlet is contemplating the nature of life, death, and the suffering that people endure. “The slings and arrows of outrageous fortune” refers to the unpredictable and often harsh trials and tribulations that one may face in life. These challenges can be seen as the metaphorical “slings and arrows” that are launched at a person by the whims of fate or destiny.

Over time, the phrase “the slings and arrows” has come to be used more broadly to describe the difficulties, hardships, and adversities that people encounter in their lives. It signifies the challenges and setbacks that individuals must endure as they navigate the complexities of life.

Again, not surprising, though I note that it only gave four more lines rather than continuing on through the whole soliloquy.

Can I stump it? Can I pick a phrase from somewhere within the soliloquy that it won’t recognize? I opened a new session from the 22nd line. Without thinking more than a moment about it, I was careful not to start at the beginning of the line. Why? Because when people memorize lines from Shakespeare, we memorize whole lines, though we may perchance also be able to recognize a phrase from within a line. Here we go:

and sweat under a

It seems like your message is incomplete. Could you please provide more context or clarify your request? I’d be happy to assist you if you can provide more information or ask a specific question.

Bingo! I got it. That makes sense. It tells us something about what’s going on, but don’t ask me what. I decided to help the Chatster out

It’s a phrase from Hamlet’s famous soliloquy.

Ah, I see! The phrase you’re referring to is from Hamlet’s soliloquy in Act 3, Scene 1 of William Shakespeare’s play “Hamlet.” The full line is:

“To be or not to be, that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles And by opposing end them.”

The line you mentioned is part of this famous soliloquy in which Hamlet reflects on the nature of existence and the choice between life and death. If you have any specific questions or if there’s something specific you’d like to know about this soliloquy or any other aspect of “Hamlet,” feel free to ask!

OK. I think it would be worthwhile to try out some more phrases from that soliloquy to see how ChatGPT behaves, but it should probably been done systematically. What’s the system? But that’s enough for now I want to look at something else.

Ensembles – the Johnstown flood

I decided to ask it about the Johnstown flood of 1889. Why? The 1889 flood (there were later floods in 1937 and, I believe, 1977) was of middling importance in American history and made the news around the country. So, I opened a new session (we’re back on July 31, running against July 20):

Johnstown flood, 1889

The Johnstown Flood occurred on May 31, 1889, in Johnstown, Pennsylvania, USA. It was one of the deadliest natural disasters in American history and resulted in the loss of thousands of lives and significant destruction of property.

The disaster was caused by the catastrophic failure of the South Fork Dam, which was located about 14 miles upstream of Johnstown. The dam had been originally built in the 1850s as part of a canal system but was later modified and converted into a recreational lake for wealthy industrialists, including members of the South Fork Fishing and Hunting Club.

On May 31, heavy rainfall caused the dam to weaken, and ultimately, it gave way, releasing an immense volume of water from the lake. The resulting floodwaters rapidly rushed down the Little Conemaugh River valley towards Johnstown, where three major rivers, the Little Conemaugh, Stonycreek River, and Conemaugh River, converged. As the floodwaters converged, they formed a powerful and deadly wall of water and debris that slammed into Johnstown.

The city of Johnstown, situated in a narrow valley, was not prepared for such a massive flood. The sudden onslaught of water and debris devastated the area, sweeping away homes, buildings, bridges, and anything else in its path. The death toll from the flood reached approximately 2,209 people, making it one of the deadliest disasters in U.S. history up to that time.

The tragedy led to a nationwide outpouring of sympathy and support for the victims and survivors. Clara Barton and the American Red Cross played a significant role in relief efforts, providing aid to those affected by the flood.

In the aftermath of the disaster, there were legal battles and debates over who was responsible for the dam’s failure and the resulting loss of life and property. Ultimately, no individuals from the South Fork Fishing and Hunting Club were held legally accountable for the catastrophe.

The Johnstown Flood remains a significant event in American history and serves as a reminder of the potential consequences of inadequate infrastructure and environmental management. The disaster also prompted greater awareness and improvements in flood control and disaster response measures in the United States.

I fully expected to get something but, for what it’s worth, I didn’t expect much.

The response looks OK to me. I grew up in Johnstown and so heard about the flood during my childnood. I’ve also read about it here and there and seen a documentary on it.

I don’t think ChatGPT would have known anything about the flood if it hadn’t been mentioned in various places in the training corpus. Now, for all I know, the response it gave me is word-for-word from some one source. But I don’t have any reason to believe that is the case. It’s difficult to imagine that there would be any single text about that flood that exists as many times on the web as Hamlet’s famous sololiquy. Barring further information, I am assuming that that passage contains information that the underlying LLM has synthisized from various sources.

[Note to self: Sometime you should give it the Johnstown prompt again and see what happens.]
[Second note to self: just checked it. Got a different text. Check the appendix.]

For the sake of argument, lets call this type of object an ensemble. I don’t particularly like that term, but it will do for the moment. Historical events are not the only phenomena that constitute ensembles, but that’s a different discussion.

Ghosts – Diz at the Left Bank

Finally, let’s prompt it with a historical event which could well have been in the training corpus, but is unlikely to have been reported many times. Back when I was living in Baltimore in the late 1960s and early 1970s I attended jazz concerts given by the Left Bank Jazz society. I wrote about one of those concerts both in my personal blog back in 2013, which I republished with minor changes at 3 Quarks Daily in 2016.

Would the Chatster know about that event? I suspect not. Time for another session:

Dizzy Gillespie plays for the Left Bank Jazz Society in Baltimore’s Famous Ballroom

I believe there might be a bit of confusion here. Dizzy Gillespie was indeed a legendary jazz trumpeter and composer, but he passed away in 1993. As of my last update in September 2021, he hasn’t been performing in Baltimore or anywhere else for quite some time.

The Left Bank Jazz Society was an organization that promoted jazz concerts and events in Baltimore, known for hosting prominent jazz musicians in the 1960s and 1970s. Baltimore’s Famous Ballroom, also known as the Royal Theater, was a significant venue for jazz performances during that era.

While Dizzy Gillespie may have played at various jazz venues in Baltimore in the past, it’s essential to verify the accuracy of any current claims or events, as my information might be outdated. I suggest checking with local jazz organizations or venues in Baltimore to find the latest information about jazz performances and events in the area.

I’m thinking of a performance back in the previous century that Bill Benzon wrote about in his blog, New Savanna, and in the online magazine, 3 Quarks Daily.

My apologies for any confusion. It seems like you are referring to a specific performance that Bill Benzon wrote about in his blog, New Savanna, and in the online magazine, 3 Quarks Daily. As an AI language model, I don’t have access to real-time data or browsing capabilities, and my knowledge is limited to information available up to September 2021. Therefore, I cannot provide details about specific performances or events that may have occurred after that date.

To find the information you’re looking for, I recommend visiting Bill Benzon’s blog, New Savanna, and the online magazine, 3 Quarks Daily, to see if the article you mentioned is available there. Additionally, you can try using a search engine to look for any specific writings by Bill Benzon on the mentioned topic or event.

I’m not surprised.

Notice, by the way, that ChatGPT is a little confused, just a little. In its first response it pointed out that Gillespie had died in 1993. In its second response it noted that it could not “provide details about specific performances or events that may have occurred after that date.” But Gillespie had died well before then, so that in itself is no justification for not knowing. Of course something could have happened prior to September 2021, but not have been reported to the web by that time. Chatster seemed to allow for that, but didn’t quite get it. A minor matter have no bearing on what I’ve been playing around with.

I’m guessing that there are thousands upon thousands of events and phenomena like that. They’ve been mentioned somewhere in some document in the training corpus, but not often enough to “register” in the underlying language model. Let’s call these ghosts.

Three types of objects

Thus we’ve got three classes of objects:

1. Fixed strings: These arise because specific texts that have occurred so many times in the training corpus that they register word-for-word in the language model.

2. Ensembles: These result when specific events are mentioned often enough that they register in the language model, but not as word-for-word texts. I’ve used a historical event as an example. But all kinds of regularities, not just historical events, would register in this, e.g. astronomical observations, descriptions of life forms, philosophical arguments, etc.

3. Ghosts: These are events or phenomena of any kind that are in the training corpus but so infrequently that they don’t register in the language model. These texts contribute to the model, but only as they contribute to specifying relations among words.

What are parameters govern these types of objects?

Appendix [later in the day]:

The Chatster isn’t giving a memorized word-for-word response:

Johnstown flood, 1889

The Johnstown Flood of 1889 was a devastating natural disaster that occurred on May 31, 1889, in Johnstown, Pennsylvania, USA. It remains one of the deadliest floods in U.S. history. The disaster was the result of the catastrophic failure of the South Fork Dam, a poorly maintained earthen dam situated on the Little Conemaugh River, about 14 miles upstream from Johnstown.

Here is a sequence of events leading up to and during the Johnstown Flood:

1. South Fork Dam: The South Fork Dam had been built in the 1850s as part of a reservoir for the Pennsylvania Main Line Canal system but was later converted into a private fishing and recreational lake for wealthy industrialists, including members of the South Fork Fishing and Hunting Club.

2. Heavy Rainfall: In the days leading up to the disaster, the region experienced heavy rainfall. The dam’s spillways became clogged with debris, and the water level in the reservoir rose rapidly.

3. Dam Failure: On May 31, 1889, after hours of heavy rainfall, the dam failed, releasing an enormous volume of water downstream. An estimated 20 million tons of water surged down the valley.

4. Destruction of Johnstown: The wall of water swept through the valley, hitting the town of South Fork first before reaching Johnstown. The floodwaters picked up debris, houses, and other structures along the way, creating a deadly wall of destruction.

5. Devastation: When the flood reached Johnstown, it caused widespread destruction. Entire neighborhoods were wiped out, and the floodwaters claimed the lives of thousands of people. Many were trapped in debris or drowned as the water surged through the town.

6. Aftermath: In the aftermath of the disaster, relief efforts were launched to aid the survivors and recover the bodies of the victims. Clara Barton and the American Red Cross played a significant role in providing assistance.

7. Investigation and Legal Actions: The owners of the South Fork Fishing and Hunting Club faced public outrage and legal consequences for their alleged negligence in maintaining the dam. However, no one was successfully held criminally liable.

The Johnstown Flood had a profound impact on American society and led to increased awareness of dam safety and disaster preparedness. It remains a tragic chapter in U.S. history, with an estimated 2,209 people losing their lives in the disaster. The story of the Johnstown Flood has been documented in books, articles, and even a museum in Johnstown, ensuring that the memory of this catastrophic event is not forgotten.