Chris van Merwijk comments on The Waluigi Effect (mega-post)

Chris van Merwijk 29 Mar 2023 19:45 UTC
LW: 9 AF: 5
6
AF
Therefore, the waluigi eigen-simulacra are attractor states of the LLM
It seems to me like this informal argument is a bit suspect. Actually I think this argument would not apply to Solomonof Induction.
Suppose we have to programs that have distributions over bitstrings. Suppose p1 assigns uniform probability to each bitstring, while p2 assigns 100% probability to the string of all zeroes. (equivalently, p1 i.i.d. samples bernoully from {0,1}, p2 samples 0 i.i.d. with 100%).

Suppose we use a perfect Bayesian reasoner to sample bitstrings, but we do it in precisely the same way LLMs do it according to the simulator model. That is, given a bitstring, we first formulate a posterior over programs, i.e. a “superposition” on programs, which we use to sample the next bit, then we recompute the posterior, etc.

Then I think the probability of sampling 00000000… is just 50%. I.e. I think the distribution over bitstrings that you end up with is just the same as if you just first sampled the program and stuck with it.

I think tHere’s a messy calculation which could be simplified (which I won’t do):
$P (f i r s t n b i t s a r e a l l z e r o) = n \prod i = 0 P (x_{i} = 0 | x_{< i} = 0) = \prod \sum p \in p 1, p 2 P (x_{i} = 0 | p) * P (p | x_{< i} = 0) = \prod \frac{2^{- i - 1} + 1}{2^{- i} + 1} = \frac{2^{- i - 1} + 1}{2}$
Limit of this is 0.5.

I don’t wanna try to generalize this, but based on this example it seems like if an LLM was an actual Bayesian, Waluigi’s would not be attractors. The informal argument is wrong because it doesn’t take into account the fact that over time you sample increasingly many non-waluigi samples, pushing down the probability of Waluigi.

Then again, the presense of a context window completely breaks the above calculation in a way that preserves the point. Maybe the context window is what makes Waluigi’s into an attractor? (Seems unlikely actually, given that the context windows are fairly big).
- Cleo Nardo 29 Mar 2023 21:24 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Yep, you’re correct. The original argument in the Waluigi mega-post was sloppy.
  - If $μ$ updated the amplitudes in a perfectly bayesian way and the context window was infinite, then the amplitudes of each premise must be a martingale. But the finite context breaks this.
  - Here is a toy model which shows how the finite context window leads to Waluigi Effect. Basically, the finite context window biases the Dynamic LLM towards premises which can be evidenced by short strings (e.g. waluigi), and biases away from premises which can’t be evidenced by short strings (e.g. luigis).
  - Regarding your other comment, a long context window doesn’t mean that the waluigis won’t appear quickly. Even with an infinite context window, the waluigi might appear immediately. The assumption that the context window is short/finite is only necessary to establish that the waluigi is an absorbing state but luigi isn’t.