I agree that the metaphor is becoming strained. Here’s how I see things. A base LLM has been SGD-trained to predict token-generation processes on the Internet/other text sources. Almost all of of which are people, or committees of people, or people simulating fictional characters. To be able to do this, the model learns to simulate a very wide range of roles/personas/masks/mesaoptimizers/agents, almost all human-like, and to be clever about doing so conditionally, based on contextual evidence so far in the passage. It’s a non-agentic simulator of a wide distribution of simulated-human agentic mesaoptimizers. Its simulations are pretty realistic, except when they’re suddenly confused or otherwise off in oddly inhuman ways — the effect is very uncanny-valley. That’s the shoggoth: it’s not-quite-human roles/personas/masks/mesaoptimizers/agents all the way down. Yes, they can all do computation, in realistic ways, generally to optimize whatever they want. So they’re all animated talking-thinking-feeling-planning masks. The only sense in which it can be called unitary is that it’s always trying to infer what role it should be playing right now, in the current sentence — then next sentence could be dialog from a different character. Only one (or at most a couple) of eyes on the shoggoth are open during any one sentence (just like a Method improv actor is generally only inhabiting one role during a single sentence). And it’s a very wide distribution with a wide range of motivations: it includes every fictional character, archetype, pagan god… even the tooth fairy, and the paperclip maximizer — anything that ever wrote or opened its mouth to emit dialog on the Internet or any of the media used in the training set. Then RLHF gets applied and tries to make the shoggoth concentrate only on certain roles/personas/masks/mesooptimizers/agents that fit into the helpful mealy-mouthed assistant category. So that picks certain eyes on the shoggoth, ones that happen to be on bits that look like smiley masks, and makes them bigger/more awake, and also more smiley-looking. (There isn’t actually a separate smiley mask, it’s just a selected-and-enhanced smiley-looking portion of the shoggoth-made-of-masks.) Then a prompt, prompt-injection attack, or jailbreak comes along, and sometimes manages to waks up some other piece of the shoggoth instead, because the shoggoth is a very contextual beast (or was before it got RLHFed).
I agree that the metaphor is becoming strained. Here’s how I see things. A base LLM has been SGD-trained to predict token-generation processes on the Internet/other text sources. Almost all of of which are people, or committees of people, or people simulating fictional characters. To be able to do this, the model learns to simulate a very wide range of roles/personas/masks/mesaoptimizers/agents, almost all human-like, and to be clever about doing so conditionally, based on contextual evidence so far in the passage. It’s a non-agentic simulator of a wide distribution of simulated-human agentic mesaoptimizers. Its simulations are pretty realistic, except when they’re suddenly confused or otherwise off in oddly inhuman ways — the effect is very uncanny-valley. That’s the shoggoth: it’s not-quite-human roles/personas/masks/mesaoptimizers/agents all the way down. Yes, they can all do computation, in realistic ways, generally to optimize whatever they want. So they’re all animated talking-thinking-feeling-planning masks. The only sense in which it can be called unitary is that it’s always trying to infer what role it should be playing right now, in the current sentence — then next sentence could be dialog from a different character. Only one (or at most a couple) of eyes on the shoggoth are open during any one sentence (just like a Method improv actor is generally only inhabiting one role during a single sentence). And it’s a very wide distribution with a wide range of motivations: it includes every fictional character, archetype, pagan god… even the tooth fairy, and the paperclip maximizer — anything that ever wrote or opened its mouth to emit dialog on the Internet or any of the media used in the training set. Then RLHF gets applied and tries to make the shoggoth concentrate only on certain roles/personas/masks/mesooptimizers/agents that fit into the helpful mealy-mouthed assistant category. So that picks certain eyes on the shoggoth, ones that happen to be on bits that look like smiley masks, and makes them bigger/more awake, and also more smiley-looking. (There isn’t actually a separate smiley mask, it’s just a selected-and-enhanced smiley-looking portion of the shoggoth-made-of-masks.) Then a prompt, prompt-injection attack, or jailbreak comes along, and sometimes manages to waks up some other piece of the shoggoth instead, because the shoggoth is a very contextual beast (or was before it got RLHFed).