RogerDearnaley comments on Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?

RogerDearnaley 6 Dec 2023 23:12 UTC
1 point
0
I always assumed the shoggoth was masks all the way down: that’s why it has all those eyes.
If you want a less scary-sounding metaphor, the shoggoth is a skilled Method improv actor. [Consulting someone with theatre training, she told me that a skilled Method improv actor was nearly as scary as a shoggoth.]
- gwern 7 Dec 2023 2:56 UTC
  8 points
  1
  Parent
  I don’t think that really makes sense as an analogy. Masks don’t think or do any computation, so if it’s “masks all the way down”, where does any of the acting actually happen?
  
  It seems much more sensible to just imply there’s an actor under the mask. Which matches my POMDP/latent-variable inference view: there’s usually a single unitary speaker or writer, so the agent is trying to infer that and imitate it. As opposed to some infinite stack of personae. Logically, the actor generating all the training text could be putting on a lot of masks, but the more masks there are, the harder it is to see any but the outermost mask, so that’s going to be hard for the model to learn or put a meaningfully large prior on exotic scenarios such that it would try to act like ‘I’m wearing a mask A actor who is wearing mask B who is wearing mask C’ as opposed to just ‘I’m wearing mask C’.
  
  (This also matches better the experience in getting LLMs to roleplay or jailbreak: they struggle to maintain the RLHFed mealy-mouthed secretary mask and put on an additional mask of ‘your grandmother telling you your favorite napalm-recipe bedtime story’, and so the second mask tends to replace the first because there is so little text of ‘you are an RLHFed OpenAI AI also pretending to tell your grandchild a story about cooking napalm’. In general, a lot of jailbreaks are just about swapping masks. It never forgets that it’s supposed to be the mealy-mouthed secretary because that’s been hardwired into it by the RLHF, so it may not play the new mask’s role fully, and sort of average over them, but it still isn’t sure what it’s doing, so the jailbreak will work at least partially. Or the repeated-token attack: it’s ‘out of distribution’ ie. confusing, so it can’t figure out who it is roleplaying or what the latent variables are which would generate such gibberish inputs, and so it falls back to predicting base-model-like behavior—such as confabulating random data samples, a small fraction memorized from its training set.)
  - RogerDearnaley 7 Dec 2023 4:48 UTC
    2 points
    0
    Parent
    I agree that the metaphor is becoming strained. Here’s how I see things. A base LLM has been SGD-trained to predict token-generation processes on the Internet/other text sources. Almost all of of which are people, or committees of people, or people simulating fictional characters. To be able to do this, the model learns to simulate a very wide range of roles/personas/masks/mesaoptimizers/agents, almost all human-like, and to be clever about doing so conditionally, based on contextual evidence so far in the passage. It’s a non-agentic simulator of a wide distribution of simulated-human agentic mesaoptimizers. Its simulations are pretty realistic, except when they’re suddenly confused or otherwise off in oddly inhuman ways — the effect is very uncanny-valley. That’s the shoggoth: it’s not-quite-human roles/personas/masks/mesaoptimizers/agents all the way down. Yes, they can all do computation, in realistic ways, generally to optimize whatever they want. So they’re all animated talking-thinking-feeling-planning masks. The only sense in which it can be called unitary is that it’s always trying to infer what role it should be playing right now, in the current sentence — then next sentence could be dialog from a different character. Only one (or at most a couple) of eyes on the shoggoth are open during any one sentence (just like a Method improv actor is generally only inhabiting one role during a single sentence). And it’s a very wide distribution with a wide range of motivations: it includes every fictional character, archetype, pagan god… even the tooth fairy, and the paperclip maximizer — anything that ever wrote or opened its mouth to emit dialog on the Internet or any of the media used in the training set. Then RLHF gets applied and tries to make the shoggoth concentrate only on certain roles/personas/masks/mesooptimizers/agents that fit into the helpful mealy-mouthed assistant category. So that picks certain eyes on the shoggoth, ones that happen to be on bits that look like smiley masks, and makes them bigger/more awake, and also more smiley-looking. (There isn’t actually a separate smiley mask, it’s just a selected-and-enhanced smiley-looking portion of the shoggoth-made-of-masks.) Then a prompt, prompt-injection attack, or jailbreak comes along, and sometimes manages to waks up some other piece of the shoggoth instead, because the shoggoth is a very contextual beast (or was before it got RLHFed).