Cleo Nardo comments on The Waluigi Effect (mega-post)

Cleo Nardo 29 Mar 2023 21:24 UTC
LW: 2 AF: 1
0
AF
Yep, you’re correct. The original argument in the Waluigi mega-post was sloppy.
- If $μ$ updated the amplitudes in a perfectly bayesian way and the context window was infinite, then the amplitudes of each premise must be a martingale. But the finite context breaks this.
- Here is a toy model which shows how the finite context window leads to Waluigi Effect. Basically, the finite context window biases the Dynamic LLM towards premises which can be evidenced by short strings (e.g. waluigi), and biases away from premises which can’t be evidenced by short strings (e.g. luigis).
- Regarding your other comment, a long context window doesn’t mean that the waluigis won’t appear quickly. Even with an infinite context window, the waluigi might appear immediately. The assumption that the context window is short/finite is only necessary to establish that the waluigi is an absorbing state but luigi isn’t.