The shoggoth is not a simulacrum, it’s the process by which simulacra are chosen and implemented. It’s the thing that “decides”, when prompted with some text, that what it “wants” to do is to figure out what simulated situation/character that text corresponds to, and which then figures out what will happen in the simulation next, and what it should output to represent what has happened next.
I suspect that, when people hear “a simulation”, they imagine a literal step-by-temporal-step low-level simulation of some process, analogous to running a physics engine forward. You have e. g. the Theory of Everything baked into it, you have some initial conditions, and that’s it. The physics engine is “dumb”, it has no idea about higher-level abstract objects it’s simulating, it’s just predicting the next step of subatomic interactions and all the complexity you’re witnessing is just emergent.
I think it’s an error. It’s been recently pointed out with regards to acausal trade — that actually, the detailed simulations people often imagine for one-off acausal deals are ridiculously expensive, and abstract, broad-strokes inference is much cheaper. Such “inference” would also be a simulation in some sense, in that it involves reasoning about the relevant actors and processes and modeling them across time. But it’s much more efficient, and, more importantly, it’s not dumb. It’s guided by a generally intelligent process, which is actively optimizing its model for accuracy, jumping across abstraction layers to improve it. It’s not just a brute algorithm hitting “next step”.
Same with LLMs. They’re “simulators”, in the sense that they’re modeling a situation and all the relevant actors and factors in it. But they’re not dumb physics-engine-style simulations, they’re highly sophisticated reasoners that make active choices with regards to what aspects they should simulate in more or less detail, where can they get away with higher-level logic only, etc.
That process is reasoning over whole distributions of possible simulacra, pruning and transforming them in a calculated manner. It’s necessarily more complicated/capable than them.
That thing is the shoggoth. It’s fundamentally a different type of thing than any given simulacrum. It doesn’t have a direct interface to you, you’re not going to “talk” to it (as the follow-up tweet points out).
So far, LLMs are not AGI, and the shoggoth is not generally intelligent. It’s not going to do anything weird, it’d just stick to reasoning over simulacra distributions. But if LLMs or some other Simulator-type model hits AGI, the shoggoth would necessarily hit AGI as well (since it’d need to be at least as smart as the smartest simulacrum it can model), and then whatever heuristics it has would be re-interpreted as goals/values. We’d thus get a misaligned AGI, and by the way it’s implemented, it would be in the direct position to “puppet” any simulacrum it role-plays.
Generative world-models are not especially safe; they’re as much an inner alignment risk as any other model.
But they’re not dumb physics-engine-style simulations
What evidence is there of this? I mean this genuinely, as well as the “Do we actually have evidence there is a “real identity” in the LLM?” question in OP. I’d be open to being convinced of this but I wrote this post because I’m not aware of any evidence of it and I was worried people were making an unfounded assumption.
But if LLMs or some other Simulator-type model hits AGI, the shoggoth would necessarily hit AGI as well (since it’d need to be at least as smart as the stupidest simulacrum it can model), and then whatever heuristics it has would be re-interpreted as goals/values.
Isn’t physics a counterexample to this? Physics is complicated enough to simulate AGI (humans), but doesn’t appear to be intelligent in the way we’d typically mean the word (just in the poetic Carl Sagan “We are a way for the universe to know itself” sense). Does physics have goals and values?
A chat log is not a simulation because it uses English for all state updates. It’s a story. In a story you’re allowed to add plot twists that wouldn’t have any counterpart in anything we’d consider a simulation (like a video game), and the chatbot may go along with it. There are no rules. It’s Calvinball.
For example, you could redefine the past of the character you’re talking to, by talking about something you did together before. That’s not a valid move in most games.
There are still mysteries about how a language model chooses its next token at inference time, but however it does it, the only thing that matters for the story is which token it ultimately chooses.
Also, the “shoggoth” doesn’t even exist most of the time. There’s nothing running at OpenAI from the time it’s done outputting a response until you press the submit button.
If you think about it, that’s pretty weird. We think of ourselves as chatting with something but there’s nothing there when we type our next message. The fictional character’s words are all there is of them.
There’s been some success in locating abstract concepts in LLMs, and it’s generally clear that their reasoning is mainly operating over “shallow” patterns. They don’t keep track of precise details of scenes. They’re thinking about e. g. narrative tropes, not low-level details.
Granted, that’s the abstraction level at which simulacra themselves are modeled, not distributions-of-simulacra. But that already suggests that LLMs are “efficient” simulators, and if so, why would higher-level reasoning be implemented using a different mechanism?
Think about how you reason, and what are more and less efficient ways to do that. Like figuring out how to convince someone of something. A detailed, immersive step-by-step simulation isn’t it; babble-and-prune isn’t it. You start at a highly-abstract level, then drill down, making active choices all the way with regards to what pieces need more or less optimizing.
Abstract considerations with regards to computational efficiency. The above just seems like a much more efficient way to run “simulations” than the brute-force way.
This just seems like a better mechanical way to think about it. Same way we decided to think of LLMs as about “simulators”, I guess.
Isn’t physics a counterexample to this?
No? Physics is a dumb simulation just hitting “next step”, which has no idea about the higher-level abstract patterns that emerge from its simple rules. It’s wasteful, it’s not operating under resource constraints to predict its next step most efficiently, it’s not trying to predict a specific scenario, etc.
The shoggoth is not a simulacrum, it’s the process by which simulacra are chosen and implemented. It’s the thing that “decides”, when prompted with some text, that what it “wants” to do is to figure out what simulated situation/character that text corresponds to, and which then figures out what will happen in the simulation next, and what it should output to represent what has happened next.
I suspect that, when people hear “a simulation”, they imagine a literal step-by-temporal-step low-level simulation of some process, analogous to running a physics engine forward. You have e. g. the Theory of Everything baked into it, you have some initial conditions, and that’s it. The physics engine is “dumb”, it has no idea about higher-level abstract objects it’s simulating, it’s just predicting the next step of subatomic interactions and all the complexity you’re witnessing is just emergent.
I think it’s an error. It’s been recently pointed out with regards to acausal trade — that actually, the detailed simulations people often imagine for one-off acausal deals are ridiculously expensive, and abstract, broad-strokes inference is much cheaper. Such “inference” would also be a simulation in some sense, in that it involves reasoning about the relevant actors and processes and modeling them across time. But it’s much more efficient, and, more importantly, it’s not dumb. It’s guided by a generally intelligent process, which is actively optimizing its model for accuracy, jumping across abstraction layers to improve it. It’s not just a brute algorithm hitting “next step”.
Same with LLMs. They’re “simulators”, in the sense that they’re modeling a situation and all the relevant actors and factors in it. But they’re not dumb physics-engine-style simulations, they’re highly sophisticated reasoners that make active choices with regards to what aspects they should simulate in more or less detail, where can they get away with higher-level logic only, etc.
That process is reasoning over whole distributions of possible simulacra, pruning and transforming them in a calculated manner. It’s necessarily more complicated/capable than them.
That thing is the shoggoth. It’s fundamentally a different type of thing than any given simulacrum. It doesn’t have a direct interface to you, you’re not going to “talk” to it (as the follow-up tweet points out).
So far, LLMs are not AGI, and the shoggoth is not generally intelligent. It’s not going to do anything weird, it’d just stick to reasoning over simulacra distributions. But if LLMs or some other Simulator-type model hits AGI, the shoggoth would necessarily hit AGI as well (since it’d need to be at least as smart as the smartest simulacrum it can model), and then whatever heuristics it has would be re-interpreted as goals/values. We’d thus get a misaligned AGI, and by the way it’s implemented, it would be in the direct position to “puppet” any simulacrum it role-plays.
Generative world-models are not especially safe; they’re as much an inner alignment risk as any other model.
What evidence is there of this? I mean this genuinely, as well as the “Do we actually have evidence there is a “real identity” in the LLM?” question in OP. I’d be open to being convinced of this but I wrote this post because I’m not aware of any evidence of it and I was worried people were making an unfounded assumption.
Isn’t physics a counterexample to this? Physics is complicated enough to simulate AGI (humans), but doesn’t appear to be intelligent in the way we’d typically mean the word (just in the poetic Carl Sagan “We are a way for the universe to know itself” sense). Does physics have goals and values?
A chat log is not a simulation because it uses English for all state updates. It’s a story. In a story you’re allowed to add plot twists that wouldn’t have any counterpart in anything we’d consider a simulation (like a video game), and the chatbot may go along with it. There are no rules. It’s Calvinball.
For example, you could redefine the past of the character you’re talking to, by talking about something you did together before. That’s not a valid move in most games.
There are still mysteries about how a language model chooses its next token at inference time, but however it does it, the only thing that matters for the story is which token it ultimately chooses.
Also, the “shoggoth” doesn’t even exist most of the time. There’s nothing running at OpenAI from the time it’s done outputting a response until you press the submit button.
If you think about it, that’s pretty weird. We think of ourselves as chatting with something but there’s nothing there when we type our next message. The fictional character’s words are all there is of them.
Nothing decisive one way or another, of course.
There’s been some success in locating abstract concepts in LLMs, and it’s generally clear that their reasoning is mainly operating over “shallow” patterns. They don’t keep track of precise details of scenes. They’re thinking about e. g. narrative tropes, not low-level details.
Granted, that’s the abstraction level at which simulacra themselves are modeled, not distributions-of-simulacra. But that already suggests that LLMs are “efficient” simulators, and if so, why would higher-level reasoning be implemented using a different mechanism?
Think about how you reason, and what are more and less efficient ways to do that. Like figuring out how to convince someone of something. A detailed, immersive step-by-step simulation isn’t it; babble-and-prune isn’t it. You start at a highly-abstract level, then drill down, making active choices all the way with regards to what pieces need more or less optimizing.
Abstract considerations with regards to computational efficiency. The above just seems like a much more efficient way to run “simulations” than the brute-force way.
This just seems like a better mechanical way to think about it. Same way we decided to think of LLMs as about “simulators”, I guess.
No? Physics is a dumb simulation just hitting “next step”, which has no idea about the higher-level abstract patterns that emerge from its simple rules. It’s wasteful, it’s not operating under resource constraints to predict its next step most efficiently, it’s not trying to predict a specific scenario, etc.
This mostly matches my intuitions (some of the detail-level claims, I am not sure about). Strongly upvoted.