The longer reply will include an image that might help, but a couple other notes. If it causes you to doubt the asymptotic result, it might be helpful to read the benignity proof (especially the proof of Rejecting the Simple Memory-Based Lemma, which isn’t that long). The heuristic reason for why it can be helpful to decrease β for long-run behavior, even though long-run behavior is qualitatively similar, is that while accuracy eventually becomes the dominant concern, along the way the prior is *sort of* a random perturbation to this which changes the posterior weight, so for two world-models that are exactly equally accurate, we need to make sure the malign one is penalized for being slower, enough to outweigh the inconvenient possible outcome in which it has shorter description length. Put another way, for benignity, we don’t need concern for speed to dominate concern for accuracy; we need it to dominate concern for “simplicity” (on some reference machine).
so for two world-models that are exactly equally accurate, we need to make sure the malign one is penalized for being slower, enough to outweigh the inconvenient possible outcome in which it has shorter description length
Yeah, I understand this part, but I’m not sure why, since the benign one can be extremely complex, the malign one can’t have enough of a K-complexity advantage to overcome its slowness penalty. And since (with low β) we’re going through many more different world models as the number of episodes increases, that also gives malign world models more chances to “win”? It seems hard to make any trustworthy conclusions based on the kind of informal reasoning we’ve been doing and we need to figure out the actual math somehow.
And since (with low β) we’re going through many more different world models as the number of episodes increases, that also gives malign world models more chances to “win”?
Check out the order of the quantifiers in the proofs. One β works for all possibilities. If the quantifiers were in the other order, they couldn’t be trivially flipped since the number of world-models is infinite, and the intuitive worry about malign world-models getting “more chances to win” would apply.
Let’s continue the conversation here, and this may be a good place to reference this comment.
The longer reply will include an image that might help, but a couple other notes. If it causes you to doubt the asymptotic result, it might be helpful to read the benignity proof (especially the proof of Rejecting the Simple Memory-Based Lemma, which isn’t that long). The heuristic reason for why it can be helpful to decrease β for long-run behavior, even though long-run behavior is qualitatively similar, is that while accuracy eventually becomes the dominant concern, along the way the prior is *sort of* a random perturbation to this which changes the posterior weight, so for two world-models that are exactly equally accurate, we need to make sure the malign one is penalized for being slower, enough to outweigh the inconvenient possible outcome in which it has shorter description length. Put another way, for benignity, we don’t need concern for speed to dominate concern for accuracy; we need it to dominate concern for “simplicity” (on some reference machine).
Yeah, I understand this part, but I’m not sure why, since the benign one can be extremely complex, the malign one can’t have enough of a K-complexity advantage to overcome its slowness penalty. And since (with low β) we’re going through many more different world models as the number of episodes increases, that also gives malign world models more chances to “win”? It seems hard to make any trustworthy conclusions based on the kind of informal reasoning we’ve been doing and we need to figure out the actual math somehow.
Check out the order of the quantifiers in the proofs. One β works for all possibilities. If the quantifiers were in the other order, they couldn’t be trivially flipped since the number of world-models is infinite, and the intuitive worry about malign world-models getting “more chances to win” would apply.
Let’s continue the conversation here, and this may be a good place to reference this comment.