Tom Shlomi comments on The Waluigi Effect (mega-post)

Tom Shlomi 8 Mar 2023 1:15 UTC
13 points
5
Context windows could make the claim from the post correct. Since the simulator can only consider a bounded amount of evidence at once, its P[Waluigi] has a lower bound. Meanwhile, it takes much less evidence than fits in the context window to bring its P[Luigi] down to effectively 0.
Imagine that, in your example, once Waluigi outputs B it will always continue outputting B (if he’s already revealed to be Waluigi, there’s no point in acting like Luigi). If there’s a context window of 10, then the simulator’s probability of Waluigi never goes below 1/1025, while Luigi’s probability permanently goes to 0 once B is outputted, and so the simulator is guaranteed to eventually get stuck at Waluigi.
I expect this is true for most imperfections that simulators can have; its harder to keep track of a bunch of small updates for X over Y than it is for one big update for Y over X.