You might want to formally state the thing you want proved in Proposition 2; right now I can’t even tell what you are trying to claim. Some issues with the current formalization:
B doesn’t appear as an unbound variable in the left hand side of your equation (because you take the limit as it goes to infinity), but it does appear on the right hand side of the equation, which seems pretty wild.
I don’t know what the symbol ∼ is supposed to mean; the text suggests it means “proportional” but I don’t think you mean that I can replace the symbol ∼ with =k× where k is some constant of proportionality.
It seems very sketchy that in the LHS sa is treated as evidence (to the right of the conditioning bar) while in the RHS it is not—what if sa is very low probability?
My best guess is that you want to relate the quantities P(sb∣sa,B) and maxs1…sBP(s1,…sB,sb∣sa), but I don’t see why there would be any straightforward relation between these quantities (apart from the obvious one where the max sequence is one way to get the token sb and so is a lower bound on its probability, i.e. P(sb∣sa,B)≥maxs1,…sBP(s1,…sB,sb∣sa)).
EDIT: Maybe you want to say that P(sb∣sa,B) is “not much higher than” maxs1…sBP(s1,…sB,sb∣sa)? If so, that seems false for LLMs; imagine the case where sa=sb="the",B=1,000.
Hi, thanks for the response! I apologize, the “Left as an exercise” line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people’s time and I feel pretty bad about it. Mea culpa.
I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn’t be shocked if there are mistakes in there). Regarding your points:
The limit now shows up on both sides of the equation (as it should)! The dependence on B on the RHS does actually kind of drop away at some point, but I’m not showing that here. I’d previously just sloppily substituted “chose B as a large number” and then rewrite the proposition in the way indicated at the end of the Note for Proposition 2. That’s the way these large deviation principles are typically used.
Yeah, that should have been an ≈ rather than a ∼. Sorry, sloppy.
True. Thinking more about it now, perhaps framing the proposition in terms of “bridges” was a confusing choice; if I revisit this post again (in a month or so 🤦♂️) I will work on cleaning that up.
You might want to formally state the thing you want proved in Proposition 2; right now I can’t even tell what you are trying to claim. Some issues with the current formalization:
B doesn’t appear as an unbound variable in the left hand side of your equation (because you take the limit as it goes to infinity), but it does appear on the right hand side of the equation, which seems pretty wild.
I don’t know what the symbol ∼ is supposed to mean; the text suggests it means “proportional” but I don’t think you mean that I can replace the symbol ∼ with =k× where k is some constant of proportionality.
It seems very sketchy that in the LHS sa is treated as evidence (to the right of the conditioning bar) while in the RHS it is not—what if sa is very low probability?
My best guess is that you want to relate the quantities P(sb∣sa,B) and maxs1…sBP(s1,…sB,sb∣sa), but I don’t see why there would be any straightforward relation between these quantities (apart from the obvious one where the max sequence is one way to get the token sb and so is a lower bound on its probability, i.e. P(sb∣sa,B)≥maxs1,…sBP(s1,…sB,sb∣sa)).
EDIT: Maybe you want to say that P(sb∣sa,B) is “not much higher than” maxs1…sBP(s1,…sB,sb∣sa)? If so, that seems false for LLMs; imagine the case where sa=sb="the",B=1,000.
Hi, thanks for the response! I apologize, the “Left as an exercise” line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people’s time and I feel pretty bad about it. Mea culpa.
I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn’t be shocked if there are mistakes in there). Regarding your points:
The limit now shows up on both sides of the equation (as it should)! The dependence on B on the RHS does actually kind of drop away at some point, but I’m not showing that here. I’d previously just sloppily substituted “chose B as a large number” and then rewrite the proposition in the way indicated at the end of the Note for Proposition 2. That’s the way these large deviation principles are typically used.
Yeah, that should have been an ≈ rather than a ∼. Sorry, sloppy.
True. Thinking more about it now, perhaps framing the proposition in terms of “bridges” was a confusing choice; if I revisit this post again (in a month or so 🤦♂️) I will work on cleaning that up.
Agreed. As a linguist, I looked at the Proposition 2 and immediately thought “sketchy, shouldn’t hold in a good model of a language model”.