I think the biggest pitfall of the “simulator” framing is that it’s made people (including Beth Barnes?) think it’s all about simulating our physical reality, when exactly because of the constraints you mention (text not actually pinpointing the state of the universe, etc.), the abstractions developed by a predictor are usually better understood in terms of treating the text itself as the state, and learning time-evolution rules for that state.
Thinking about the state and time evolution rules for the state seems fine, but there isn’t any interesting structure with the naive formulation imo. The state is the entire text, so we don’t get any interesting Markov chain structure. (you can turn any random process into a Markov chain where you include the entire history in the state! The interesting property was that the past didn’t matter!)
Hm, I mostly agree. There isn’t any interesting structure by default, you have to get it by trying to mimic a training distribution that has interesting structure.
And I think this relates to another way that I was too reductive, which is that if I want to talk about “simulacra” as a thing, then they don’t exist purely in the text, so I must be sneaking in another ontology somewhere—an ontology that consists of features inferred from text (but still not actually the state of our real universe).
Nitpick: I mean, technically, the state is only the last 4k tokens or however long your context length is. Though I agree this is still very uninteresting.
The time-evolution rules of the state are simply the probabilities of the autoregressive model—there’s some amount of high level structure but not a lot. (As Ryan says, you don’t get the normal property you want from a state (the Markov property) except in a very weak sense.)
I also disagree that purely thinking about the text as state + GPT-3 as evolution rules is the intention of the original simulators post; there’s a lot of discussion about the content of the simulations themselves as simulated realities or alternative universes (though the post does clarify that it’s not literally physical reality), e.g.:
I can’t convey all that experiential data here, so here are some rationalizations of why I’m partial to the term, inspired by the context of this post:
The word “simulator” evokes a model of real processes which can be used to run virtual processes in virtual reality.
It suggests an ontological distinction between the simulator and things that are simulated, and avoids the fallacy of attributing contingent properties of the latter to the former.
It’s not confusing that multiple simulacra can be instantiated at once, or an agent embedded in a tragedy, etc.
[...]
The next post will be all about the physics analogy, so here I’ll only tie what I said earlier to the simulation objective.
the upper bound of what can be learned from a dataset is not the most capable trajectory, but the conditional structure of the universe implicated by their sum.
To know the conditional structure of the universe[27] is to know its laws of physics, which describe what is expected to happen under what conditions.
I think insofar as people end up thinking the simulation is an exact match for physical reality, the problem was not in the simulators frame itself, but instead the fact that the word physics was used 47 times in the post, while only the first few instances make it clear that literal physics is intended only as a metaphor.
I think the biggest pitfall of the “simulator” framing is that it’s made people (including Beth Barnes?) think it’s all about simulating our physical reality, when exactly because of the constraints you mention (text not actually pinpointing the state of the universe, etc.), the abstractions developed by a predictor are usually better understood in terms of treating the text itself as the state, and learning time-evolution rules for that state.
Thinking about the state and time evolution rules for the state seems fine, but there isn’t any interesting structure with the naive formulation imo. The state is the entire text, so we don’t get any interesting Markov chain structure. (you can turn any random process into a Markov chain where you include the entire history in the state! The interesting property was that the past didn’t matter!)
Hm, I mostly agree. There isn’t any interesting structure by default, you have to get it by trying to mimic a training distribution that has interesting structure.
And I think this relates to another way that I was too reductive, which is that if I want to talk about “simulacra” as a thing, then they don’t exist purely in the text, so I must be sneaking in another ontology somewhere—an ontology that consists of features inferred from text (but still not actually the state of our real universe).
Nitpick: I mean, technically, the state is only the last 4k tokens or however long your context length is. Though I agree this is still very uninteresting.
The time-evolution rules of the state are simply the probabilities of the autoregressive model—there’s some amount of high level structure but not a lot. (As Ryan says, you don’t get the normal property you want from a state (the Markov property) except in a very weak sense.)
I also disagree that purely thinking about the text as state + GPT-3 as evolution rules is the intention of the original simulators post; there’s a lot of discussion about the content of the simulations themselves as simulated realities or alternative universes (though the post does clarify that it’s not literally physical reality), e.g.:
I think insofar as people end up thinking the simulation is an exact match for physical reality, the problem was not in the simulators frame itself, but instead the fact that the word physics was used 47 times in the post, while only the first few instances make it clear that literal physics is intended only as a metaphor.