Hey, thanks for the comment! Agree that this argument is good and important. The thing I’d want to investigate to deal with this is basically “in a small hidden channel, does cognition degrade or just slow down”.
If cognition degrades, then a small hidden channel would lead to poor attempts at steganography, which we could catch and suppress, preventing a snowball.
If cognition instead just slows down, then if the AI is smart enough to come up with steganography protocols too good for us to detect, it might successfully secure more bandwidth on its first try, starting a snowball.
Even in the slowdown case, there are fallback hopes like “it’s slow enough for us to extract work and predictably stop the episode before the snowball starts” or “we can obscure how good our steganography detection is”
Hey, thanks for the comment! Agree that this argument is good and important. The thing I’d want to investigate to deal with this is basically “in a small hidden channel, does cognition degrade or just slow down”.
If cognition degrades, then a small hidden channel would lead to poor attempts at steganography, which we could catch and suppress, preventing a snowball.
If cognition instead just slows down, then if the AI is smart enough to come up with steganography protocols too good for us to detect, it might successfully secure more bandwidth on its first try, starting a snowball.
Even in the slowdown case, there are fallback hopes like “it’s slow enough for us to extract work and predictably stop the episode before the snowball starts” or “we can obscure how good our steganography detection is”