Rohin Shah comments on [Simulators seminar sequence] #2 Semiotic physics—revamped

Rohin Shah 5 Jan 2023 10:41 UTC
LW: 11 AF: 7
1
AF
Proof sketch: Left to the reader as an exercise.
You might want to formally state the thing you want proved in Proposition 2; right now I can’t even tell what you are trying to claim. Some issues with the current formalization:
1. $B$ doesn’t appear as an unbound variable in the left hand side of your equation (because you take the limit as it goes to infinity), but it does appear on the right hand side of the equation, which seems pretty wild.
2. I don’t know what the symbol $\sim$ is supposed to mean; the text suggests it means “proportional” but I don’t think you mean that I can replace the symbol $\sim$ with $= k \times$ where $k$ is some constant of proportionality.
3. It seems very sketchy that in the LHS $s_{a}$ is treated as evidence (to the right of the conditioning bar) while in the RHS it is not—what if $s_{a}$ is very low probability?
My best guess is that you want to relate the quantities $P (s_{b} ∣ s_{a}, B)$ and ${max}_{s_{1} \dots s_{B}} P (s_{1}, \dots s_{B}, s_{b} ∣ s_{a})$ , but I don’t see why there would be any straightforward relation between these quantities (apart from the obvious one where the max sequence is one way to get the token $s_{b}$ and so is a lower bound on its probability, i.e. $P (s_{b} ∣ s_{a}, B) \geq {max}_{s_{1}, \dots s_{B}} P (s_{1}, \dots s_{B}, s_{b} ∣ s_{a})$ ).
EDIT: Maybe you want to say that $P (s_{b} ∣ s_{a}, B)$ is “not much higher than” ${max}_{s_{1} \dots s_{B}} P (s_{1}, \dots s_{B}, s_{b} ∣ s_{a})$ ? If so, that seems false for LLMs; imagine the case where $s_{a} = s_{b} = "the", B = 1, 000$ .
- Jan 27 Feb 2023 1:38 UTC
  LW: 3 AF: 3
  0
  AF Parent
  Hi, thanks for the response! I apologize, the “Left as an exercise” line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people’s time and I feel pretty bad about it. Mea culpa.
  I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn’t be shocked if there are mistakes in there). Regarding your points:
  1. The limit now shows up on both sides of the equation (as it should)! The dependence on $B$ on the RHS does actually kind of drop away at some point, but I’m not showing that here. I’d previously just sloppily substituted “chose $B$ as a large number” and then rewrite the proposition in the way indicated at the end of the Note for Proposition 2. That’s the way these large deviation principles are typically used.
  2. Yeah, that should have been an $\approx$ rather than a $\sim$ . Sorry, sloppy.
  3. True. Thinking more about it now, perhaps framing the proposition in terms of “bridges” was a confusing choice; if I revisit this post again (in a month or so 🤦‍♂️) I will work on cleaning that up.
- Дмитрий Зеленский 9 Jan 2023 15:40 UTC
  1 point
  0
  Parent
  Agreed. As a linguist, I looked at the Proposition 2 and immediately thought “sketchy, shouldn’t hold in a good model of a language model”.