Perhaps we’re talking past each other to a degree. I don’t disagree with what you’re saying. I think I’ve been unclear—or perhaps just saying almost vacuous things. I’m attempting to make a very weak claim (I think the post is also making no strong claim—not about internal mechanism, at least).
I only mean that the output can often be efficiently understood in terms of human characters (among other things). I.e. that the output is a simulation, and that human-like minds will be an efficient abstraction for us to use when thinking about such a simulation. Privileging hypotheses involving the dynamics of the outputs of human-like minds will tend to usefully constrain expectations.
Again, I’m saying something obvious here—perhaps it’s too obvious to you. The only real content is something like [thinking of the output as being a simulation including various simulacra, is likely to be less misleading than thinking of it as the response of an agent].
I do not mean to imply that the internal cognition of the model necessarily has anything simulation-like about it. I do not mean that individual outputs are produced by simulation. I think you’re correct that this is highly unlikely to be the most efficient internal mechanism to predict text.
Overall, I think the word “simulation” invites confusion, since it’s forever unclear whether we’re pointing at the output of a simulation process, or the internal structure of that process. Generally I’m saying: [add a token single token] : single simulation step—using the training distribution’s ‘physics’. [long string of tokens] : a simulation [process of generating a single token] : [highly unlikely to be a simulation]
I’m using ‘simulation’ as it’s used in the post [the imitation of the operation of a real-world process or system over time]. The real-world process is the production of the string of tokens.
I still think that referring to what the LLM does in one step as “a simulation” is at best misleading. “a prediction” seems accurate and not to mislead in the same way.
Ah, so again, you’re making the distinction that the process of generating a single token is just a single timestep of a simulation, rather than saying its highly unlikely to be an emulation (or even a single timestep of an emulation). With which I agree, though I don’t see it as a distinction inobvious enough that I’d expect many people to trip over it. (Perhaps my background is showing.)
OK, then we were talking rather at cross-purposes: thanks for explaining!
Perhaps we’re talking past each other to a degree. I don’t disagree with what you’re saying.
I think I’ve been unclear—or perhaps just saying almost vacuous things. I’m attempting to make a very weak claim (I think the post is also making no strong claim—not about internal mechanism, at least).
I only mean that the output can often be efficiently understood in terms of human characters (among other things). I.e. that the output is a simulation, and that human-like minds will be an efficient abstraction for us to use when thinking about such a simulation. Privileging hypotheses involving the dynamics of the outputs of human-like minds will tend to usefully constrain expectations.
Again, I’m saying something obvious here—perhaps it’s too obvious to you. The only real content is something like [thinking of the output as being a simulation including various simulacra, is likely to be less misleading than thinking of it as the response of an agent].
I do not mean to imply that the internal cognition of the model necessarily has anything simulation-like about it. I do not mean that individual outputs are produced by simulation. I think you’re correct that this is highly unlikely to be the most efficient internal mechanism to predict text.
Overall, I think the word “simulation” invites confusion, since it’s forever unclear whether we’re pointing at the output of a simulation process, or the internal structure of that process.
Generally I’m saying:
[add a token single token] : single simulation step—using the training distribution’s ‘physics’.
[long string of tokens] : a simulation
[process of generating a single token] : [highly unlikely to be a simulation]
Did you in fact mean ‘emulation’ for the last of those three items?
I’m using ‘simulation’ as it’s used in the post [the imitation of the operation of a real-world process or system over time]. The real-world process is the production of the string of tokens.
I still think that referring to what the LLM does in one step as “a simulation” is at best misleading. “a prediction” seems accurate and not to mislead in the same way.
Ah, so again, you’re making the distinction that the process of generating a single token is just a single timestep of a simulation, rather than saying its highly unlikely to be an emulation (or even a single timestep of an emulation). With which I agree, though I don’t see it as a distinction inobvious enough that I’d expect many people to trip over it. (Perhaps my background is showing.)
OK, then we were talking rather at cross-purposes: thanks for explaining!