It seems like much of the press around this paper discussed it as a ‘big red button’ to turn off a rogue AI. This would be somewhat in-line with your previous work around limited impact AIs who are indifferent to their being turned off, but it doesn’t seem to really describe this paper. My interpretation of this paper is it doesn’t make the AI indifferent to interruption, or prevent the AI from learning about the button—it just helps the AI avoid a particular kind of distraction during the training phase. Being able to implement the interruption is a separate issue—but it seems that designing a form of interruption that the AI won’t try to avoid is the tough problem. Is this reading right, or am I missing something?
Hey Stuart,
It seems like much of the press around this paper discussed it as a ‘big red button’ to turn off a rogue AI. This would be somewhat in-line with your previous work around limited impact AIs who are indifferent to their being turned off, but it doesn’t seem to really describe this paper. My interpretation of this paper is it doesn’t make the AI indifferent to interruption, or prevent the AI from learning about the button—it just helps the AI avoid a particular kind of distraction during the training phase. Being able to implement the interruption is a separate issue—but it seems that designing a form of interruption that the AI won’t try to avoid is the tough problem. Is this reading right, or am I missing something?