Thinking about the state and time evolution rules for the state seems fine, but there isn’t any interesting structure with the naive formulation imo. The state is the entire text, so we don’t get any interesting Markov chain structure. (you can turn any random process into a Markov chain where you include the entire history in the state! The interesting property was that the past didn’t matter!)
Hm, I mostly agree. There isn’t any interesting structure by default, you have to get it by trying to mimic a training distribution that has interesting structure.
And I think this relates to another way that I was too reductive, which is that if I want to talk about “simulacra” as a thing, then they don’t exist purely in the text, so I must be sneaking in another ontology somewhere—an ontology that consists of features inferred from text (but still not actually the state of our real universe).
Nitpick: I mean, technically, the state is only the last 4k tokens or however long your context length is. Though I agree this is still very uninteresting.
Thinking about the state and time evolution rules for the state seems fine, but there isn’t any interesting structure with the naive formulation imo. The state is the entire text, so we don’t get any interesting Markov chain structure. (you can turn any random process into a Markov chain where you include the entire history in the state! The interesting property was that the past didn’t matter!)
Hm, I mostly agree. There isn’t any interesting structure by default, you have to get it by trying to mimic a training distribution that has interesting structure.
And I think this relates to another way that I was too reductive, which is that if I want to talk about “simulacra” as a thing, then they don’t exist purely in the text, so I must be sneaking in another ontology somewhere—an ontology that consists of features inferred from text (but still not actually the state of our real universe).
Nitpick: I mean, technically, the state is only the last 4k tokens or however long your context length is. Though I agree this is still very uninteresting.