I find this post fairly uninteresting, and feel irritated when people confidently make statements about “simulacra.” One problem is, on my understanding, that it doesn’t really reduce the problem of how LLMs work. “Why did GPT-4 say that thing?” “Because it was simulating someone who was saying that thing.” It does postulate some kind of internal gating network which chooses between the different “experts” (simulacra), so it isn’t contentless, but… Yeah.
Also I don’t think that LLMs have “hidden internal intelligence”, given e.g LLMs trained on “A is B” fail to learn “B is A”. Big evidence against the simulators hypothesis. And I don’t think people have nearly enough evidence to be going around talking about “what is the LLM simulating”, unless this is some really loose metaphor, in which case it should be marked as such.
I also think it isn’t useful to think of LLMs as “simulating stuff” or having a “shoggoth” or whatever. I think that can often give a false sense of understanding.
However, I think this post did properly call out the huge miss of earlier speculation about oracles and agents and such.
Also I don’t think that LLMs have “hidden internal intelligence”, given e.g LLMs trained on “A is B” fail to learn “B is A”
I’m not quite sure what you mean by “hidden internal intelligence”, but if you mean “quite alien abilities and cognitive processes”, then I disagree and think it’s quite likely that SOTA LLMs have this. If you instead mean something like “an inner homunculus reasoning about what to simulate”, then I totally agree that LLMs very likely don’t have this. (Though I don’t see how the reversal curse provides much evidence either way on either of these claims.)
I think it’s pretty likely that there are many cases where LLMs are notably superhuman in some way. For instance, I think that LLMs are wildly superhuman at next token prediction and generally I think base models have somewhat alien intelligence profiles (which is perhaps dropped to some extent in current RLHF’d chatbots).
These superhuman abilities are probably non-trivial to directly use, but might be possible to elicit with some effort (though it’s unclear if these abilities are very important or very useful for anything we care about).
If you instead mean something like “an inner homunculus reasoning about what to simulate”, then I totally agree that LLMs very likely don’t have this
Yeah, I meant something like this. The reversal curse is evidence because if most output was controlled by “inner beings”, presumably they’d be smart enough to “remember” the reversal.
It’s very strange conclusion. I certainly find easier to recall “word A in foreign language means X” than reversal. If homunculus simulated me (or vast majority of humans), it would create multiple instances of reversal curse.
Distant philosophical example: my brain is smart enough to control my body, but I definitely can’t use its knowledge to create humanoid robots from scratch.
I’m not a simulator enthusiast, but I find your reasoning kinda sloppy.
I agree that this is a somewhat dated post. Janus has said similarly and I’ve encouraged them to edit the intro to say “yall shouldn’t have been impressed by this” or something. with that said, some very weak defenses of a couple of specific things:
having a “shoggoth”
the way to ground that reasonably is that the shoggoth is the hypersurfaces of decision boundary enclosed volumes. it’s mainly useful as a metaphor if it works as a way to translate into english the very basic idea that neural networks are function approximators. a lot of metaphorical terms are in my view attempts (which generally don’t succeed, especially, it seems, for you) to convey that neural networks are, fundamentally, just adjustable high dimensional kaleidoscopes.
it isn’t contentless
it’s not trying to be highly contentful, it’s trying to clarify a bunch of people’s wrong intuitions about the very basics of what is even happening. If you already grok how taking the derivative of cross entropy of two sequences requires a language model to approximate a function which compresses towards the data’s entropy floor, then the idea that the model “learns to simulate” is far too vague and inspecific. but if you didn’t already grok why that math is what we use to define how well the model is performing at its task, then it might not be obvious what that task is, and calling it a “simulator” helps clarify the task.
that can often give a false sense of understanding.
I find this post fairly uninteresting, and feel irritated when people confidently make statements about “simulacra.” One problem is, on my understanding, that it doesn’t really reduce the problem of how LLMs work. “Why did GPT-4 say that thing?” “Because it was simulating someone who was saying that thing.” It does postulate some kind of internal gating network which chooses between the different “experts” (simulacra), so it isn’t contentless, but… Yeah.
Also I don’t think that LLMs have “hidden internal intelligence”, given e.g LLMs trained on “A is B” fail to learn “B is A”. Big evidence against the simulators hypothesis. And I don’t think people have nearly enough evidence to be going around talking about “what is the LLM simulating”, unless this is some really loose metaphor, in which case it should be marked as such.
I also think it isn’t useful to think of LLMs as “simulating stuff” or having a “shoggoth” or whatever. I think that can often give a false sense of understanding.
However, I think this post did properly call out the huge miss of earlier speculation about oracles and agents and such.
I like this comment and agree overall.
But, I do think I have one relevant disagreement:
I’m not quite sure what you mean by “hidden internal intelligence”, but if you mean “quite alien abilities and cognitive processes”, then I disagree and think it’s quite likely that SOTA LLMs have this. If you instead mean something like “an inner homunculus reasoning about what to simulate”, then I totally agree that LLMs very likely don’t have this. (Though I don’t see how the reversal curse provides much evidence either way on either of these claims.)
I think it’s pretty likely that there are many cases where LLMs are notably superhuman in some way. For instance, I think that LLMs are wildly superhuman at next token prediction and generally I think base models have somewhat alien intelligence profiles (which is perhaps dropped to some extent in current RLHF’d chatbots).
These superhuman abilities are probably non-trivial to directly use, but might be possible to elicit with some effort (though it’s unclear if these abilities are very important or very useful for anything we care about).
Yeah, I meant something like this. The reversal curse is evidence because if most output was controlled by “inner beings”, presumably they’d be smart enough to “remember” the reversal.
It’s very strange conclusion. I certainly find easier to recall “word A in foreign language means X” than reversal. If homunculus simulated me (or vast majority of humans), it would create multiple instances of reversal curse.
Distant philosophical example: my brain is smart enough to control my body, but I definitely can’t use its knowledge to create humanoid robots from scratch.
I’m not a simulator enthusiast, but I find your reasoning kinda sloppy.
I agree that this is a somewhat dated post. Janus has said similarly and I’ve encouraged them to edit the intro to say “yall shouldn’t have been impressed by this” or something. with that said, some very weak defenses of a couple of specific things:
the way to ground that reasonably is that the shoggoth is the hypersurfaces of decision boundary enclosed volumes. it’s mainly useful as a metaphor if it works as a way to translate into english the very basic idea that neural networks are function approximators. a lot of metaphorical terms are in my view attempts (which generally don’t succeed, especially, it seems, for you) to convey that neural networks are, fundamentally, just adjustable high dimensional kaleidoscopes.
it’s not trying to be highly contentful, it’s trying to clarify a bunch of people’s wrong intuitions about the very basics of what is even happening. If you already grok how taking the derivative of cross entropy of two sequences requires a language model to approximate a function which compresses towards the data’s entropy floor, then the idea that the model “learns to simulate” is far too vague and inspecific. but if you didn’t already grok why that math is what we use to define how well the model is performing at its task, then it might not be obvious what that task is, and calling it a “simulator” helps clarify the task.
yeah, agreed.
I don’t think Simulators claims or implies that LLMs have “hidden internal intelligence” or “an inner homunculus reasoning about what to simulate”, though. Where are you getting it from? This conclusion makes me think you’re referring to this post by Eliezer and not Simulators.
In which way is reversal curse an evidence against simulation hypothesis?