We have a particular autoregressive language model μ:Tk→Δ(T), and Simulator Theory says that μ is simulating a whole series of simulacra which are consistent with the prompt.
where μs is the stochastic process corresponding to a simulacrum s∈S.
Now, there are two objections to this:
Firstly, is it actually true that μ has this particular structure?
Secondly, even if it were true, why are we warranted in saying that GPT is simulating all these simulacra?
The first objection is a purely technical question, whereas the second is conceptual. In this article, I present a criterion which partially answers the second objection.
Note that the first objection — is it actually true that μ has this particular structure? — is a question about a particular autoregressive language model. You might give one answer for GPT-2 and a different answer for GPT-4.
I’m confused what you mean to claim. Understood that a language model factorizes the joint distribution over tokens autoregessively, into the product of next-token distributions conditioned on their prefixes. Also understood that it is possible to instead factorize the joint distribution over tokens into a conditional distribution over tokens conditioned on a latent variable (call it s) weighted by the prior over s. These are claims about possible factorizations of a distribution, and about which factorization the language model uses.
What are you claiming beyond that?
Are you claiming something about the internal structure of the language model?
Are you claiming something about the structure of the true distribution over tokens?
Are you claiming something about the structure of the generative process that produces the true distribution over tokens?
Are you claiming something about the structure of the world more broadly?
Are you claiming something about correspondences between the above?
Let’s take LLM Simulator Theory.
We have a particular autoregressive language model μ:Tk→Δ(T), and Simulator Theory says that μ is simulating a whole series of simulacra which are consistent with the prompt.
Formally speaking,
μ(tk+1|t1,…,tk)=1P(t1,…,tk)∑s∈SP(S)×μs(t1,…,tk)×μs(tk+1|t1,…,tk)
where μs is the stochastic process corresponding to a simulacrum s∈S.
Now, there are two objections to this:
Firstly, is it actually true that μ has this particular structure?
Secondly, even if it were true, why are we warranted in saying that GPT is simulating all these simulacra?
The first objection is a purely technical question, whereas the second is conceptual. In this article, I present a criterion which partially answers the second objection.
Note that the first objection — is it actually true that μ has this particular structure? — is a question about a particular autoregressive language model. You might give one answer for GPT-2 and a different answer for GPT-4.
I’m confused what you mean to claim. Understood that a language model factorizes the joint distribution over tokens autoregessively, into the product of next-token distributions conditioned on their prefixes. Also understood that it is possible to instead factorize the joint distribution over tokens into a conditional distribution over tokens conditioned on a latent variable (call it
s
) weighted by the prior overs
. These are claims about possible factorizations of a distribution, and about which factorization the language model uses.What are you claiming beyond that?
Are you claiming something about the internal structure of the language model?
Are you claiming something about the structure of the true distribution over tokens?
Are you claiming something about the structure of the generative process that produces the true distribution over tokens?
Are you claiming something about the structure of the world more broadly?
Are you claiming something about correspondences between the above?