The problem with this hypothesis is that it’s not clear how this would work in practice. Human behavior isn’t well-defined in the absence of an environment, and the text humans choose to write is strongly dependent on that environment. Thus, at least at a high level of capabilities, it seems essential for the model to understand the rest of the world rather than just the individual author of some text.
That said, we should not expect the model to necessarily simulate the entire world perfectly, as there are diminishing returns on token prediction accuracy with more world simulation. Instead, it seems likely that the model will simulate the immediate environment of the text-producing agents at higher fidelity, and more distant and less causally-connected aspects of the environment at lower fidelity.
It seems to me like this line of reasoning follows from:
Conceptually, we’ll think of a predictive model as a sort of Bayes net where there are a bunch of internal hidden states corresponding to aspects of the world from which the model deduces the most likely observations to predict.
...
Conceptually, we’ll think of such conditioning as implementing a sort of back inference where the model infers a distribution over the most likely hidden states to have produced the given observations.
Why do you find the above framing compelling? It seems extremely strong, and I don’t presently expect it to be true. Maybe you provide more support later in the sequence.
Mechanistically, I don’t expect the model to in fact implement anything like a Bayes net or literal back inference—both of those are just conceptual handles for thinking about how a predictor might work. We discuss in more detail how likely different internal model structures might be in Section 4.
It seems to me like this line of reasoning follows from:
Why do you find the above framing compelling? It seems extremely strong, and I don’t presently expect it to be true. Maybe you provide more support later in the sequence.
Mechanistically, I don’t expect the model to in fact implement anything like a Bayes net or literal back inference—both of those are just conceptual handles for thinking about how a predictor might work. We discuss in more detail how likely different internal model structures might be in Section 4.
Ah, I knew the bayes net part wasn’t literal, but I wasn’t sure how load bearing the back inference was supposed to be. Thanks for clarifying.