Probably? I don’t think that addresses the question of what such an AI would do in whatever window of opportunity it has. I don’t see a reason why you couldn’t get an AI that has learned to delay its attempt to takeover until it’s out of training, but still have relatively low odds of success at takeover.
I think it means that whatever you get is conservative in cases where it’s unsure of whether it’s in training, which may translate to being conservative where it’s unsure of success in general.
I agree it doesn’t rule out an AI that takes a long shot at takeover! But whatever cognition we posit that the AI executes, it has to yield very high training performance. So AIs that think they have a very short window for influence or are less-than-perfect at detecting training environments are ruled out.
Probably? I don’t think that addresses the question of what such an AI would do in whatever window of opportunity it has. I don’t see a reason why you couldn’t get an AI that has learned to delay its attempt to takeover until it’s out of training, but still have relatively low odds of success at takeover.
I think it means that whatever you get is conservative in cases where it’s unsure of whether it’s in training, which may translate to being conservative where it’s unsure of success in general.
I agree it doesn’t rule out an AI that takes a long shot at takeover! But whatever cognition we posit that the AI executes, it has to yield very high training performance. So AIs that think they have a very short window for influence or are less-than-perfect at detecting training environments are ruled out.