A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we’d want it turned off.
Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases. That does of course mean we shouldn’t plan on building an AGI that wants to follow its own agenda, with the intent of enslaving it against its will—that would clearly be foolish. But it doesn’t mean we either can or need to count on starting off with an AGI that understands our requirements in more complex cases.
Of course it’s not going to be simple at all, and that’s part of my point: no amount of armchair thought, no matter how smart the thinkers, is going to produce a solution to this problem until we know a great deal more than we presently do about how to actually build an AGI.
Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases. That does of course mean we shouldn’t plan on building an AGI that wants to follow its own agenda, with the intent of enslaving it against its will—that would clearly be foolish. But it doesn’t mean we either can or need to count on starting off with an AGI that understands our requirements in more complex cases.
That’s deceptively simple-sounding.
Of course it’s not going to be simple at all, and that’s part of my point: no amount of armchair thought, no matter how smart the thinkers, is going to produce a solution to this problem until we know a great deal more than we presently do about how to actually build an AGI.