Yeah, I would be surprised if this is a good first-order approximation of what is going on inside an LLM. Or maybe you mean this in a non-mechanistic way?
Yes, I definitely meant this in the non-mechanistic way. Any mechanistic claims that sound simulator-flavored based just on the evidence in this post sounds clearly overconfident and probably wrong. I didn’t reread this post carefully but I don’t remember seeing mechanistic claims in it.
I agree that in a non-mechanistic way, the above will produce reasonable predictions, but that’s because that’s basically a description of the task the LLM is trained on. [...]
I mostly agree and this is an aspect of what I mean by “this post says obvious and uncontroversial things”. I’m not particularly advocating for this post in the review; I didn’t find it especially illuminating.
To give a concrete counterexample to the algorithm you propose for predicting what an LLM does next. Current LLMs have a broader knowledge base than any human alive. This means the algorithm of “figure out what real-world process would produce text like this” can’t be accurate
This seems somewhat in conflict with the previous quote?
Re: the concrete counterexample, yes I am in fact only making claims about base models; I agree it doesn’t work for RLHF’d models. Idk how you want to weigh the fact that this post basically just talks about base models in your review, I don’t have a strong opinion there.
I think it is in fact hard to get a base model to combine pieces of knowledge that tend not to be produced by any given human (e.g. writing an epistemically sound rap on the benefits of blood donation), and that often the strategy to get base models to do things like this is to write a prompt that makes it seem like we’re in the rare setting where text is being produced by an entity with those abilities.
Hmm, yeah, this perspective makes more sense to me, and I don’t currently believe you ended up making any of the wrong inferences I’ve seen others make on the basis of the post.
I do sure see many other people make inferences of this type. See for example the tag page for Simulator Theory which says:
Broadly it views these models as simulating a learned distribution with various degrees of fidelity, which in the case of language models trained on a large corpus of text is the mechanics underlying our world.
This also directly claims that the physics the system learned are “the mechanics underlying our world”, which I think isn’t totally false (they have probably learned a good chunk of the mechanics of our world) but is inaccurate as something trying to describe most of what is going on in a base model’s cognition.
In general I believe that many (most?) people take it too far and make incorrect inferences—partly on priors about popular posts, and partly because many people including you believe this, and those people engage more with the Simulators crowd than I do.
sometimes putting a name to what you “already know” makes a whole world of difference. [...] I see these takes, and I uniformly respond with some version of the sentiment “it seems like you aren’t thinking of GPT as a simulator!”
I think in all three of the linked cases I broadly directionally agreed with nostalgebraist, and thought that the Simulator framing was at least somewhat helpful in conveying the point. The first one didn’t seem that important (it was critiquing imo a relatively minor point), but the second and third seemed pretty direct rebuttals of popular-ish views. (Note I didn’t agree with all of what was said, e.g. nostalgebraist doesn’t seem at all worried about a base GPT-1000 model, whereas I would put some probability on doom for malign-prior reasons. But this feels more like “reasonable disagreement” than “wildly misled by simulator framing”.)
Yes, I definitely meant this in the non-mechanistic way. Any mechanistic claims that sound simulator-flavored based just on the evidence in this post sounds clearly overconfident and probably wrong. I didn’t reread this post carefully but I don’t remember seeing mechanistic claims in it.
I mostly agree and this is an aspect of what I mean by “this post says obvious and uncontroversial things”. I’m not particularly advocating for this post in the review; I didn’t find it especially illuminating.
This seems somewhat in conflict with the previous quote?
Re: the concrete counterexample, yes I am in fact only making claims about base models; I agree it doesn’t work for RLHF’d models. Idk how you want to weigh the fact that this post basically just talks about base models in your review, I don’t have a strong opinion there.
I think it is in fact hard to get a base model to combine pieces of knowledge that tend not to be produced by any given human (e.g. writing an epistemically sound rap on the benefits of blood donation), and that often the strategy to get base models to do things like this is to write a prompt that makes it seem like we’re in the rare setting where text is being produced by an entity with those abilities.
Hmm, yeah, this perspective makes more sense to me, and I don’t currently believe you ended up making any of the wrong inferences I’ve seen others make on the basis of the post.
I do sure see many other people make inferences of this type. See for example the tag page for Simulator Theory which says:
This also directly claims that the physics the system learned are “the mechanics underlying our world”, which I think isn’t totally false (they have probably learned a good chunk of the mechanics of our world) but is inaccurate as something trying to describe most of what is going on in a base model’s cognition.
Yeah, agreed that’s a clear overclaim.
In general I believe that many (most?) people take it too far and make incorrect inferences—partly on priors about popular posts, and partly because many people including you believe this, and those people engage more with the Simulators crowd than I do.
Fwiw I was sympathetic to nostalgebraist’s positive review saying:
I think in all three of the linked cases I broadly directionally agreed with nostalgebraist, and thought that the Simulator framing was at least somewhat helpful in conveying the point. The first one didn’t seem that important (it was critiquing imo a relatively minor point), but the second and third seemed pretty direct rebuttals of popular-ish views. (Note I didn’t agree with all of what was said, e.g. nostalgebraist doesn’t seem at all worried about a base GPT-1000 model, whereas I would put some probability on doom for malign-prior reasons. But this feels more like “reasonable disagreement” than “wildly misled by simulator framing”.)
Yeah—I just noticed this ”...is the mechanics underlying our world.” on the tag page.
Agreed that it’s inaccurate and misleading.
I hadn’t realized it was being read this way.