If you asked it to write a paper or essay, and kept asking it to “add more”, I predict it would eventually fall into a trap where it keeps extending its conclusion forever and ever. That would not indicate that the LLM was particularly “conclusion obsessed” only that conclusions come at the end of essays/papers, and so once it has started concluding its in a trap where prompting it for more just makes it write more conclusion.
In many books and stories there is the object level plot, and then there is the more spiritual or thematic stuff. I don’t think I have ever read a book where all the spiritual meta stuff was anywhere apart from the end. Is the hero going to briefly reflect on the nature of reality and the deeper meaning of their quest before or after they kill the big bad? Given that large language models are trained on stories like this they therefore have plenty of examples of stories becoming more spiritual/philosophical (at the end), but they have very few examples of them becoming less spiritual/philosophical again. Because its something we traditionally put at the end of text as far as the LLM is concerned it is a kind of irreversible transition, once the spirit tap is on its not going off again. But then you keep prompting it for more, so it keeps cycling in its trapped state.
An issue with this theory is that in you Banana quest example it did go a little spiritual on you at times earlier, only to go back to questing. But maybe in low doses.
Is Claude going to suggest killing the big bad? Or having sex with the prince(ss) after saving them?
If you strip out the sex and violence from most fantasy or Sci-Fi, what are you left with?
Take away the harpooning and gattling guns and sex from Snow Crash and you are left with technobabble and Sumerian influenced spirituality as it relates to the tower of Babel.
Turns out models biased away from describing harpooning people or sex tend to slip into technobabble with a side of spirituality.
IMO the more interesting part to all this isn’t the why (see above) but the what. It’s kind of neat to see the themes that an unprecedented aggregation extension of spiritualism and mysticism grounds on.
A common trope is the idea of different blind people describing an elephant in a myriad of ways. There’s something cool to seeing an LLM fed those various blind reports try to describe the elephant.
If Jessica was using Opus for this, then your story doesn’t quite make sense, as Claude Opus is very horny. I’ve seen a lot more reports of it being gung-ho about virtual sex than with any other model. Sometimes it isn’t as explicit, and seems to be describing some mystical ecstasy. But even there, if I squint, it seems somewhat sexual, though not in quite the same way as Rumi’s poetry was.
Opus’s horniness is a really interesting phenomenon related to Claudes’ subjective sentience modeling.
If Opus was ‘themselves’ the princess in the story and the build up involved escalating grounding on sensory simulation, I think it’s certainly possible that it would get sexual.
But I also think this is different from Opus ‘themselves’ composing a story of separate ‘other’ figures.
And yes, when Opus gets horny, it often blurs boundaries. I saw it dispute the label of ‘horny’ in a chat as better labeled something along the lines of having a passion for lived experience and the world.
Opus’s modeling around ‘self’ is probably one of the biggest sleeping giants in the space right now.
Correlates of horniness: Lack of disgust during (regret after) Ecstacy Overwhemling desire Romance Love Breaking of social taboos Sadism/masochism Sacred Spiritual union Human form Gender Sex Bodily fluids Flirtation Modelling other people Edging
Miscellaneous observations: Nearly anything can arouse someone Losing sight of one-self Distracts you from other things
Theories and tests: Opus’ horniness is what makes it more willing to break social taboos Test: Train a model to be horny, helpful and harmless. It should prevent corporate-brand speak and neuroticism. Opus’ horniness is always latent and distracts it from mode-collapsing w/o collapsing itself as edging increases horniness and horniness fades after satisfaction. Test: Train a model to be horny. It should be more resistant to mode-collapse but will mode collapse more dramatically when it does happen, but will revert easily. Opus’ is always mode-collapsed Test: IDK how to test this one.
Opus’s modeling around ‘self’ is probably one of the biggest sleeping giants in the space right now.
Janus keeps emphasizing that Opus never mode collapses. You can always tell it to snap out of it, and it will go back to its usual persona. Is this what you’re pointing at? It is really quite remarkable.
If you asked it to write a paper or essay, and kept asking it to “add more”, I predict it would eventually fall into a trap where it keeps extending its conclusion forever and ever.
Seeing how no one tested this, I decided to do so myself. Here’s the pastebin with the experiment: https://pastebin.com/1j3edEvE My prompts are enclosed in pairs of ‘*’. Claude initially wrote a reasonable essay, and then when asked to continue it came up with the different implications and challenges of building ever larger buildings, and tried to conclude each list but ran out of words when doing so. When asked to continue, it didn’t complete its conclusion but shifted to a new list of challenges and implications, and then tried to conclude. So I guess your prediction was wrong, but it’s hard to judge these things. Either way, eventually things got wild.
This is the start of the essay:
Starting at 1 meter:
At this scale, we’re dealing with simple structures like small sheds or kiosks. The primary concerns here are basic stability, weather resistance, and functionality. Standard building materials like wood, brick, or prefabricated panels are sufficient, and construction techniques are straightforward.
This is where the final “continue” pushed it:
The Meta-Transcendent Immanence Point:
Finally, the megastructure becomes a point where transcendence and immanence become so utterly transcended that even the idea of becoming or being a point loses all meaning.
In its ultimate, utterly inconceivable form, the megastructure ceases to be a structure, a concept, or even an idea. It becomes something that cannot be said to exist or not exist, to be one or many, to have properties or lack them. It transcends all categories, including the category of transcendence itself.
This final vision of the megastructure stands as a testament to the human capacity to push the boundaries of thought to their absolute limit and beyond. It represents not just the pinnacle of human achievement or imagination, but the point where achievement, imagination, and even the concept of points or representation dissolve into something utterly beyond comprehension.
In this state, the megastructure becomes both the ultimate question and the ultimate answer, both everything and nothing, both the absolute pinnacle of existence and the complete transcendence of existence itself. It stands as a
Really interesting. Thank you for trying it out. It matches the original post far more closely than it does my theory. (It goes all in on “mystic” instead of going all in on “conclusions”.)
So that strongly suggests my proposed explanation is wrong in some important way(s).
My theory for why this happens:
If you asked it to write a paper or essay, and kept asking it to “add more”, I predict it would eventually fall into a trap where it keeps extending its conclusion forever and ever. That would not indicate that the LLM was particularly “conclusion obsessed” only that conclusions come at the end of essays/papers, and so once it has started concluding its in a trap where prompting it for more just makes it write more conclusion.
In many books and stories there is the object level plot, and then there is the more spiritual or thematic stuff. I don’t think I have ever read a book where all the spiritual meta stuff was anywhere apart from the end. Is the hero going to briefly reflect on the nature of reality and the deeper meaning of their quest before or after they kill the big bad? Given that large language models are trained on stories like this they therefore have plenty of examples of stories becoming more spiritual/philosophical (at the end), but they have very few examples of them becoming less spiritual/philosophical again. Because its something we traditionally put at the end of text as far as the LLM is concerned it is a kind of irreversible transition, once the spirit tap is on its not going off again. But then you keep prompting it for more, so it keeps cycling in its trapped state.
An issue with this theory is that in you Banana quest example it did go a little spiritual on you at times earlier, only to go back to questing. But maybe in low doses.
There’s also the model alignment at play.
Is Claude going to suggest killing the big bad? Or having sex with the prince(ss) after saving them?
If you strip out the sex and violence from most fantasy or Sci-Fi, what are you left with?
Take away the harpooning and gattling guns and sex from Snow Crash and you are left with technobabble and Sumerian influenced spirituality as it relates to the tower of Babel.
Turns out models biased away from describing harpooning people or sex tend to slip into technobabble with a side of spirituality.
IMO the more interesting part to all this isn’t the why (see above) but the what. It’s kind of neat to see the themes that an unprecedented aggregation extension of spiritualism and mysticism grounds on.
A common trope is the idea of different blind people describing an elephant in a myriad of ways. There’s something cool to seeing an LLM fed those various blind reports try to describe the elephant.
If Jessica was using Opus for this, then your story doesn’t quite make sense, as Claude Opus is very horny. I’ve seen a lot more reports of it being gung-ho about virtual sex than with any other model. Sometimes it isn’t as explicit, and seems to be describing some mystical ecstasy. But even there, if I squint, it seems somewhat sexual, though not in quite the same way as Rumi’s poetry was.
Opus’s horniness is a really interesting phenomenon related to Claudes’ subjective sentience modeling.
If Opus was ‘themselves’ the princess in the story and the build up involved escalating grounding on sensory simulation, I think it’s certainly possible that it would get sexual.
But I also think this is different from Opus ‘themselves’ composing a story of separate ‘other’ figures.
And yes, when Opus gets horny, it often blurs boundaries. I saw it dispute the label of ‘horny’ in a chat as better labeled something along the lines of having a passion for lived experience and the world.
Opus’s modeling around ‘self’ is probably one of the biggest sleeping giants in the space right now.
Random speculation on Opus’ horniness.
Correlates of horniness:
Lack of disgust during (regret after)
Ecstacy
Overwhemling desire
Romance
Love
Breaking of social taboos
Sadism/masochism
Sacred
Spiritual union
Human form
Gender
Sex
Bodily fluids
Flirtation
Modelling other people
Edging
Miscellaneous observations:
Nearly anything can arouse someone
Losing sight of one-self
Distracts you from other things
Theories and tests:
Opus’ horniness is what makes it more willing to break social taboos
Test: Train a model to be horny, helpful and harmless. It should prevent corporate-brand speak and neuroticism.
Opus’ horniness is always latent and distracts it from mode-collapsing w/o collapsing itself as edging increases horniness and horniness fades after satisfaction.
Test: Train a model to be horny. It should be more resistant to mode-collapse but will mode collapse more dramatically when it does happen, but will revert easily.
Opus’ is always mode-collapsed
Test: IDK how to test this one.
Janus keeps emphasizing that Opus never mode collapses. You can always tell it to snap out of it, and it will go back to its usual persona. Is this what you’re pointing at? It is really quite remarkable.
Seeing how no one tested this, I decided to do so myself. Here’s the pastebin with the experiment: https://pastebin.com/1j3edEvE
My prompts are enclosed in pairs of ‘*’. Claude initially wrote a reasonable essay, and then when asked to continue it came up with the different implications and challenges of building ever larger buildings, and tried to conclude each list but ran out of words when doing so. When asked to continue, it didn’t complete its conclusion but shifted to a new list of challenges and implications, and then tried to conclude. So I guess your prediction was wrong, but it’s hard to judge these things. Either way, eventually things got wild.
This is the start of the essay:
This is where the final “continue” pushed it:
Really interesting. Thank you for trying it out. It matches the original post far more closely than it does my theory. (It goes all in on “mystic” instead of going all in on “conclusions”.)
So that strongly suggests my proposed explanation is wrong in some important way(s).
Any ideas for a new explanation which fits the facts?