I think janus is explicitly using the verb ‘simulate’ as opposed to ‘emulate’ because he is not making any claims about LLM internals (and indeed doesn’t think the internals, whatever they may be, include a detailed emulation), and I think that this careful distinction in terminology (which janus explicitly employs at one point in the post above, when discussing just this question, so is clearly familiar with) is sadly lost on many readers, who tend to assume that the two words mean the same thing since the word ‘simulate’ commonly misused to include ‘emulate’ — a mistake I’ve often made myself.
I agree that the word ‘predict’ would be less liable to this particular misundertanding, but I think it has some other downsides: you’d have to ask janus why he didn’t pick it.
So my claim is, if someone don’t understand why it’s called “Simulator Theory” as opposed to “Emulator Theory”, then haven’t correctly understood janus’ post. (And I have certainly seen examples of people who appear to think LLMs actually are emulators, of nearly unlimited power. For example, the ones who suggested just asking an LLM for the text of the most cited paper on AI Alignment from 2030, something that predicting correctly would require emulating a significant proportion of the world for about six years.)
The point I’m making here is that in the terms of this post the LLM defines the transition function of a simulation.
I.e. the LLM acts on [string of tokens], to produce [extended string of tokens]. The simulation is the entire thing: the string of tokens changing over time according to the action of the LLM.
Saying “the LLM is a simulation” strongly suggests that a simulation process (i.e. “the imitation of the operation of a real-world process or system over time”) is occurring within the LLM internals.
Saying “GPT is a simulator” isn’t too bad—it’s like saying “The laws of physics are a simulator”. Loosely correct. Saying “GPT is a simulation” is like saying “The laws of physics are a simulation”, which is at least misleading—I’d say wrong.
In another context it might not be too bad. In this post simulation has been specifically described as “the imitation of the operation of a real-world process or system over time”. There’s no basis to think that the LLM is doing this internally.
Unless we’re claiming that it’s doing something like that internally, we can reasonably say “The LLM produces a simulation”, but not “The LLM is a simulation”.
(oh and FYI, Janus is “they”—in the sense of actually being two people: Kyle and Laria)
The point I’m making here is that in the terms of this post the LLM defines the transition function of a simulation.
I guess (as an ex-physicist and long-time software engineer) I’m not really hung up about the fact that emulations are normally performed one timestep at a time, and simulations certainly can be, so didn’t see much need to make a linguistic distinction for it. But that’s fine, I don’t disagree. Yes, an emulation or (in applicable cases) simulation process will consist of a sequence of many timesteps, and an LLM predicting text similarly does so one token at a time sequentially (which may not, in fact, be the order that humans produced them, or consume them, though by default usually is — something that LLMs often have trouble with, presumably due to their fixed forward-pass computational capacity).
(oh and FYI, Janus is “they”—in the sense of actually being two people: Kyle and Laria)
Suddenly their username makes sense! Thanks, duely noted.
I think janus is explicitly using the verb ‘simulate’ as opposed to ‘emulate’ because he is not making any claims about LLM internals (and indeed doesn’t think the internals, whatever they may be, include a detailed emulation), and I think that this careful distinction in terminology (which janus explicitly employs at one point in the post above, when discussing just this question, so is clearly familiar with) is sadly lost on many readers, who tend to assume that the two words mean the same thing since the word ‘simulate’ commonly misused to include ‘emulate’ — a mistake I’ve often made myself.
I agree that the word ‘predict’ would be less liable to this particular misundertanding, but I think it has some other downsides: you’d have to ask janus why he didn’t pick it.
So my claim is, if someone don’t understand why it’s called “Simulator Theory” as opposed to “Emulator Theory”, then haven’t correctly understood janus’ post. (And I have certainly seen examples of people who appear to think LLMs actually are emulators, of nearly unlimited power. For example, the ones who suggested just asking an LLM for the text of the most cited paper on AI Alignment from 2030, something that predicting correctly would require emulating a significant proportion of the world for about six years.)
The point I’m making here is that in the terms of this post the LLM defines the transition function of a simulation.
I.e. the LLM acts on [string of tokens], to produce [extended string of tokens].
The simulation is the entire thing: the string of tokens changing over time according to the action of the LLM.
Saying “the LLM is a simulation” strongly suggests that a simulation process (i.e. “the imitation of the operation of a real-world process or system over time”) is occurring within the LLM internals.
Saying “GPT is a simulator” isn’t too bad—it’s like saying “The laws of physics are a simulator”. Loosely correct.
Saying “GPT is a simulation” is like saying “The laws of physics are a simulation”, which is at least misleading—I’d say wrong.
In another context it might not be too bad. In this post simulation has been specifically described as “the imitation of the operation of a real-world process or system over time”. There’s no basis to think that the LLM is doing this internally.
Unless we’re claiming that it’s doing something like that internally, we can reasonably say “The LLM produces a simulation”, but not “The LLM is a simulation”.
(oh and FYI, Janus is “they”—in the sense of actually being two people: Kyle and Laria)
I guess (as an ex-physicist and long-time software engineer) I’m not really hung up about the fact that emulations are normally performed one timestep at a time, and simulations certainly can be, so didn’t see much need to make a linguistic distinction for it. But that’s fine, I don’t disagree. Yes, an emulation or (in applicable cases) simulation process will consist of a sequence of many timesteps, and an LLM predicting text similarly does so one token at a time sequentially (which may not, in fact, be the order that humans produced them, or consume them, though by default usually is — something that LLMs often have trouble with, presumably due to their fixed forward-pass computational capacity).
Suddenly their username makes sense! Thanks, duely noted.