Charlie Steiner comments on Simulators

Charlie Steiner 11 Jan 2024 0:07 UTC
LW: 13 AF: 4
3
AF
I can at least give you the short version of why I think you’re wrong, if you want to chat lmk I guess.
Plain text: “GPT is a simulator.”
Correct interpretation: “Sampling from GPT to generate text is a simulation, where the state of the simulation’s ‘world’ is the text and GPT encodes learned transition dynamics between states of the text.”
Mistaken interpretation: “GPT works by doing a simulation of the process that generated the training data. To make predictions, it internally represents the physical state of the Earth, and predicts the next token by applying learned transition dynamics to the represented state of the Earth to get a future state of the Earth.”
-
So that’s the “core thesis.” Maybe it would help to do the same thing for some of the things you might use the simulator framing for?
Plain text: “GPT can simulate a lot of different humans.”
Correct interpretation: “The text dynamics of GPT can support long-lived dynamical processes that write text like a lot of different humans. This is a lot like how a simulation of the solar system could have a lot of different orbits depending on the initial condition, except the laws of text are a lot more complicated and anthropocentric than the laws of celestial mechanics.”
Mistaken interpretation: “When GPT is talking like a person, that’s because there is a sentient simulation of a person in there doing thinking that is then translated into words.”
Plain text: “Asking whether GPT knows some fact is the wrong question. It’s specific simulacra that know things.”
Correct interpretation: “The dynamical processes that get you human-like text out of GPT (‘simulacra’) can vary in how easy it is to get them to recite a desired fact. You might hope there’s some ‘neutral’ way to get a recitation of a fact out of GPT, but there is no such neutral way, it’s all dynamical processes. When it comes to knowing things, GPT is more like a compression algorithm than a person. It knows a fact well when that fact is the result of simple initial conditions.”
Drawback of the correct interpretation: Focuses imagination on text processes that play human roles, potentially obscuring more general ways to get to desired output text.
Mistaken interpretation: “Inside GPT’s simulation of the physical state of the Earth, it tracks what different people know.”
Plain text: “If you try to get GPT to solve hard problems, and it succeeds, it might be simulating a non-human intelligence.”
Correct interpretation: “GPT has learned text dynamics that include a lot of clever rules for getting correct answers, because it’s had to predict a lot of text that requires cleverness. A lot of those clever rules were learned to predict human text, and are interwoven with other heuristics that keep its state in the distribution of human text. But if it’s being clever in ways that humans aren’t, it’s probably going to leave the distribution of human text in other ways.”
Mistaken? interpretation: “If GPT starts getting good at reversing hashes, it’s about to break out of its server and start turning the Earth into well-predicted tokens.”
- habryka 11 Jan 2024 0:31 UTC
  LW: 4 AF: 4
  1
  AF Parent
  Sure, I wasn’t under the impression that the claim was that GPT was literally simulating earth, but I don’t understand how describing something as a simulation of this type, over a completely abstract “next token space” constraints expectations.
  Like, I feel like you can practically define all even slightly recurrent systems as “simulators” of this type. If we aren’t talking about simulating something close to human minds, what predictions can we make?
  Like, let’s say I have a very classical RL algorithm, something like AlphaZero with MCTS. It also “simulates” a game state by state into the future (into many different branches). But how does this help me predict what the system does? AlphaZero seems to share few of the relevant dynamics this post is talking about.
  - Charlie Steiner 11 Jan 2024 1:55 UTC
    LW: 6 AF: 3
    4
    AF Parent
    This is what all that talk about predictive loss was for. Training on predictive loss gets you systems that are especially well-suited to being described as learning the time-evolution dynamics of the training distribution. Not in the sense that they’re simulating the physical reality underlying the training distribution, merely in the sense that they’re learning dynamics for the behavior of the training data.
    Sure, you could talk about AlphaZero in terms of prediction. But it’s not going to have the sort of configurability that makes the simulator framing so fruitful in the case of GPT (or in the case of computer simulations of the physical world). You can’t feed AlphaZero the first 20 moves of a game by Magnus Carlsen and have it continue like him.
    Or to use a different example, one time talking about simulators is when someone asks “Does GPT know this fact?” because GPT’s dynamics are inhomogeneous—it doesn’t always act with the same quality of knowing the fact or not knowing it. But AlphaZero’s training process is actively trying to get rid of that kind of inhomogeneity—AlphaZero isn’t trained to mimic a training distribution, it’s trained to play high-scoring moves.
    The simulator framing has no accuracy advantage over thinking directly in terms of next token prediction, except that thinking in terms of simulator and simulacra sometimes usefully compresses the relevant ideas, and so lets people think larger new thoughts at once. Probably useful for coming up with ChatGPT jailbreaks. Definitely useful for coming up with prompts for base GPT.
  - Joe Collman 11 Jan 2024 5:02 UTC
    LW: 2 AF: 1
    0
    AF Parent
    To add to Charlie’s point (which seems right to me):
    As I understand things, I think we are talking about a simulation of something somewhat close to human minds—e.g. text behaviour of humanlike simulacra (made of tokens—but humans are made of atoms). There’s just no claim of an internal simulation.
    I’d guess a common upside is to avoid constraining expectations unhelpfully in ways that [GPT as agent] might.
    However, I do still worry about saying “GPT is a simulator” rather than something like “GPT currently produces simulations”.
    I think the former suggests too strongly that we understand something about what it’s doing internally—e.g. at least that it’s not inner misaligned, and won’t stop acting like a simulator at some future time (and can easily be taken to mean that it’s doing simulation internally).
    If the aim is to get people thinking more clearly, I’d want it to be clearer that this is a characterization of [what GPTs currently output], not [what GPTs fundamentally are].
    - habryka 11 Jan 2024 5:43 UTC
      LW: 2 AF: 2
      0
      AF Parent
      As I understand things, I think we are talking about a simulation of something somewhat close to human minds—e.g. text behaviour of humanlike simulacra (made of tokens—but humans are made of atoms). There’s just no claim of an internal simulation.
      I mean, that is the exact thing that I was arguing against in my review.
      I think the distribution of human text just has too many features that are hard to produce via simulating human-like minds. I agree that the system is trained on imitating human text, and that necessarily requires being able to roleplay as many different humans, but I don’t think the process of that roleplay is particularly likely to be akin to a simulation (similarly to how when humans roleplay as other humans they do a lot of cognition that isn’t simulation, i.e. when someone plays an actor in a movie they do things like explicitly thinking about the historical period in which they were set, they recognize that certain scenes will be hard to pull off, they solve a problem using the knowledge they have when not roleplaying and then retrofit their solution into something the character might have come up with, etc. When humans imitate things we are not limited to simulating the target of our imitation)
      The cognitive landscape of an LLM is also very different from humans, and it seems clear that in many contexts the behavior of an LLM will generalize quite differently than it would for a human, and simulation again seems unlikely to be the only, or honestly even primary way, I expect an LLM to get good at human text imitation given that differing cognitive landscape).
      - Joe Collman 11 Jan 2024 7:15 UTC
        LW: 4 AF: 2
        0
        AF Parent
        Oh, hang on—are you thinking that Janus is claiming that GPT works by learning some approximation to physics, rather than ‘physics’?
        IIUC, the physics being referred to is either through analogy (when it refers to real-world physics), or as a generalized ‘physics’ of [stepwise addition of tokens]. There’s no presumption of a simulation of physics (at any granularity).
        E.g.:
        Models trained with the strict simulation objective are directly incentivized to reverse-engineer the (semantic) physics of the training distribution, and consequently, to propagate simulations whose dynamical evolution is indistinguishable from that of training samples.
        Apologies if I’m the one who’s confused :).
        This just seemed like a natural explanation for your seeming to think the post is claiming a lot more mechanistically. (I think it’s claiming almost nothing)
        habryka 11 Jan 2024 16:09 UTC
        LW: 2 AF: 2
        0
        AF Parent
        No, I didn’t mean to imply that. I understand that “physics” here is a general term for understanding how any system develops forward according to some abstract definition of time.
        What I am saying is that even with a more expansive definition of physics, it seems unlikely to me that GPT internally simulates a human mind (or anything else really) in a way where structurally there is a strong similarity between the way a human brain steps forward in physical time, and the way the insides of the transformer generates additional tokens.
        Joe Collman 11 Jan 2024 19:10 UTC
        LW: 4 AF: 2
        0
        AF Parent
        Sure, but I don’t think anyone is claiming that there’s a similarity between a brain stepping forward in physical time and transformer internals. (perhaps my wording was clumsy earlier)
        IIUC, the single timestep in the ‘physics’ of the post is the generation and addition of one new token.
        I.e. GPT uses [some internal process] to generate a token.
        Adding the new token is a single atomic update to the “world state” of the simulation.
        The [some internal process] defines GPT’s “laws of physics”.
        The post isn’t claiming that GPT is doing some generalized physics internally.
        It’s saying that [GPT(input_states) --> (output_states)] can be seen as defining the physical laws by which a simulation evolves.
        As I understand it, it’s making almost no claim about internal mechanism.
        Though I think “GPT is a simulator” is only intended to apply if its simulator-like behaviour robustly generalizes—i.e. if it’s always producing output according to the “laws of physics” of the training distribution (this is imprecise, at least in my head—I’m unclear whether Janus have any more precise criterion).
        I don’t think the post is making substantive claims that disagree with [your model as I understand it]. It’s only saying: here’s a useful way to think about the behaviour of GPT.
        RogerDearnaley 12 Jan 2024 1:56 UTC
        1 point
        0
        Parent
        An LLM is a simulation, a system statistically trained to try to predict the same distribution of outputs as a human writing process (which could be a single brain in near-real-time, or an entire Wikipedia community of them interacting over years). It is not a detailed physical emulation of either of these processes.
        The simple fact that a human brain has $O (10^{14})$ synapses and current LLMs only have up to $O (10^{12})$ parameters makes it clear that it’s going to be a fairly rough simulation — I actuall find it pretty astonishing that we often get as good a simulation as we do out of a system that clearly has clearly orders of magnitude less computational complexity. Apparently. lot of aspects of human text generation aren’t so complex as to actually engage and require a large fraction of the entire computational capacity of the brain to get even a passable approximation to the output. Indeed, the LLM scaling laws give as a strong sense of how much, at an individual token-guessing level, the predictability of human text improves as you thrown more computational capacity and a larger training sample set at the problem, and the answer is logarithmic: doubling the product of computational capacity and dataset size produces a fixed amount of improvement in the perplexity measure.
        Joe Collman 12 Jan 2024 2:37 UTC
        2 points
        0
        Parent
        I don’t disagree, but I don’t think that describing the process an LLM uses to generate a single token as a simulation is clarifying in this context.
        I’m fairly sure the post is making no such claim, and I think it becomes a lot more likely that readers will have habryka’s interpretation if the word “simulation” is applied to LLM internals (and correctly conclude that this interpretation entails implausible claims).
        I think “predictor” or the like is much better here.
        Unless I’m badly misunderstanding, the post is taking a time-evolution-of-a-system view of the string of tokens—not of LLM internals.
        I don’t think it’s claiming anything about what the internal LLM mechanism looks like.
        RogerDearnaley 12 Jan 2024 4:11 UTC
        3 points
        0
        Parent
        I think janus is explicitly using the verb ‘simulate’ as opposed to ‘emulate’ because he is not making any claims about LLM internals (and indeed doesn’t think the internals, whatever they may be, include a detailed emulation), and I think that this careful distinction in terminology (which janus explicitly employs at one point in the post above, when discussing just this question, so is clearly familiar with) is sadly lost on many readers, who tend to assume that the two words mean the same thing since the word ‘simulate’ commonly misused to include ‘emulate’ — a mistake I’ve often made myself.
        I agree that the word ‘predict’ would be less liable to this particular misundertanding, but I think it has some other downsides: you’d have to ask janus why he didn’t pick it.
        So my claim is, if someone don’t understand why it’s called “Simulator Theory” as opposed to “Emulator Theory”, then haven’t correctly understood janus’ post. (And I have certainly seen examples of people who appear to think LLMs actually are emulators, of nearly unlimited power. For example, the ones who suggested just asking an LLM for the text of the most cited paper on AI Alignment from 2030, something that predicting correctly would require emulating a significant proportion of the world for about six years.)
        Joe Collman 12 Jan 2024 4:42 UTC
        2 points
        0
        Parent
        The point I’m making here is that in the terms of this post the LLM defines the transition function of a simulation.
        I.e. the LLM acts on [string of tokens], to produce [extended string of tokens].
        The simulation is the entire thing: the string of tokens changing over time according to the action of the LLM.
        Saying “the LLM is a simulation” strongly suggests that a simulation process (i.e. “the imitation of the operation of a real-world process or system over time”) is occurring within the LLM internals.
        Saying “GPT is a simulator” isn’t too bad—it’s like saying “The laws of physics are a simulator”. Loosely correct.
        Saying “GPT is a simulation” is like saying “The laws of physics are a simulation”, which is at least misleading—I’d say wrong.
        In another context it might not be too bad. In this post simulation has been specifically described as “the imitation of the operation of a real-world process or system over time”. There’s no basis to think that the LLM is doing this internally.
        Unless we’re claiming that it’s doing something like that internally, we can reasonably say “The LLM produces a simulation”, but not “The LLM is a simulation”.
        (oh and FYI, Janus is “they”—in the sense of actually being two people: Kyle and Laria)
        Expand this thread
        RogerDearnaley 12 Jan 2024 6:41 UTC
        1 point
        0
        Parent
        The point I’m making here is that in the terms of this post the LLM defines the transition function of a simulation.
        I guess (as an ex-physicist and long-time software engineer) I’m not really hung up about the fact that emulations are normally performed one timestep at a time, and simulations certainly can be, so didn’t see much need to make a linguistic distinction for it. But that’s fine, I don’t disagree. Yes, an emulation or (in applicable cases) simulation process will consist of a sequence of many timesteps, and an LLM predicting text similarly does so one token at a time sequentially (which may not, in fact, be the order that humans produced them, or consume them, though by default usually is — something that LLMs often have trouble with, presumably due to their fixed forward-pass computational capacity).
        (oh and FYI, Janus is “they”—in the sense of actually being two people: Kyle and Laria)
        Suddenly their username makes sense! Thanks, duely noted.
      - Joe Collman 11 Jan 2024 6:44 UTC
        LW: 2 AF: 1
        0
        AF Parent
        Perhaps we’re talking past each other to a degree. I don’t disagree with what you’re saying.
        I think I’ve been unclear—or perhaps just saying almost vacuous things. I’m attempting to make a very weak claim (I think the post is also making no strong claim—not about internal mechanism, at least).
        I only mean that the output can often be efficiently understood in terms of human characters (among other things). I.e. that the output is a simulation, and that human-like minds will be an efficient abstraction for us to use when thinking about such a simulation. Privileging hypotheses involving the dynamics of the outputs of human-like minds will tend to usefully constrain expectations.
        Again, I’m saying something obvious here—perhaps it’s too obvious to you. The only real content is something like [thinking of the output as being a simulation including various simulacra, is likely to be less misleading than thinking of it as the response of an agent].
        I do not mean to imply that the internal cognition of the model necessarily has anything simulation-like about it. I do not mean that individual outputs are produced by simulation. I think you’re correct that this is highly unlikely to be the most efficient internal mechanism to predict text.
        Overall, I think the word “simulation” invites confusion, since it’s forever unclear whether we’re pointing at the output of a simulation process, or the internal structure of that process.
        Generally I’m saying:
        [add a token single token] : single simulation step—using the training distribution’s ‘physics’.
        [long string of tokens] : a simulation
        [process of generating a single token] : [highly unlikely to be a simulation]
        RogerDearnaley 12 Jan 2024 4:20 UTC
        1 point
        0
        Parent
        Did you in fact mean ‘emulation’ for the last of those three items?
        Joe Collman 12 Jan 2024 4:53 UTC
        2 points
        0
        Parent
        I’m using ‘simulation’ as it’s used in the post [the imitation of the operation of a real-world process or system over time]. The real-world process is the production of the string of tokens.
        I still think that referring to what the LLM does in one step as “a simulation” is at best misleading. “a prediction” seems accurate and not to mislead in the same way.
        RogerDearnaley 12 Jan 2024 6:46 UTC
        1 point
        0
        Parent
        Ah, so again, you’re making the distinction that the process of generating a single token is just a single timestep of a simulation, rather than saying its highly unlikely to be an emulation (or even a single timestep of an emulation). With which I agree, though I don’t see it as a distinction inobvious enough that I’d expect many people to trip over it. (Perhaps my background is showing.)
        OK, then we were talking rather at cross-purposes: thanks for explaining!