Cleo Nardo comments on The algorithm isn’t doing X, it’s just doing Y.

Cleo Nardo 17 Mar 2023 1:02 UTC
9 points
4
Let’s take LLM Simulator Theory.
We have a particular autoregressive language model $μ : T^{k} \to Δ (T)$ , and Simulator Theory says that $μ$ is simulating a whole series of simulacra which are consistent with the prompt.
Formally speaking,
$μ (t_{k + 1} | t_{1}, \dots, t_{k}) = \frac{1}{P (t_{1}, \dots, t_{k})} \sum s \in S P (S) \times μ_{s} (t_{1}, \dots, t_{k}) \times μ_{s} (t_{k + 1} | t_{1}, \dots, t_{k})$
where $μ_{s}$ is the stochastic process corresponding to a simulacrum $s \in S$ .
Now, there are two objections to this:
- Firstly, is it actually true that $μ$ has this particular structure?
- Secondly, even if it were true, why are we warranted in saying that GPT is simulating all these simulacra?
The first objection is a purely technical question, whereas the second is conceptual. In this article, I present a criterion which partially answers the second objection.
Note that the first objection — is it actually true that $μ$ has this particular structure? — is a question about a particular autoregressive language model. You might give one answer for GPT-2 and a different answer for GPT-4.
- cfoster0 17 Mar 2023 1:34 UTC
  2 points
  0
  Parent
  I’m confused what you mean to claim. Understood that a language model factorizes the joint distribution over tokens autoregessively, into the product of next-token distributions conditioned on their prefixes. Also understood that it is possible to instead factorize the joint distribution over tokens into a conditional distribution over tokens conditioned on a latent variable (call it s) weighted by the prior over s. These are claims about possible factorizations of a distribution, and about which factorization the language model uses.
  
  What are you claiming beyond that?
  - Are you claiming something about the internal structure of the language model?
  - Are you claiming something about the structure of the true distribution over tokens?
  - Are you claiming something about the structure of the generative process that produces the true distribution over tokens?
  - Are you claiming something about the structure of the world more broadly?
  - Are you claiming something about correspondences between the above?