I think a crux here is that I expect sufficiently superhuman AGI to be able to easily manipulate humans without detection, so I don’t get much comfort from arguments like “It can’t kill us all as long as we don’t give it access to a factory that does X.” All it needs to do is figure out that there’s a disgruntled employee at the factory and bribe/befriend/cajole them, for example, which is absolutely possible because humans already do this (albeit less effectively than I expect an AGI to be capable of).
Likewise it seems not that hard to devise plans a human will think are good on inspection but which are actually bad. One way to do this is to have many plans with subtle interactions that look innocuous. Another is to have a single plan that exploits human blindspots (eg some crucial detail is hidden in a lengthy appendix about the effect of the plan on the badger population of East Anglia). [Incidentally I’d highly recommend watching “Yes, Minister” for countless examples of humans doing this successfully, albeit in fiction.]
No, this is not a crux. I think I mostly agree with you. But think that we are talking about an AGI that needs time, which is something that is usually denied: “as soon as an AGI is created, we all die’. Once you put time into the equation, you allow other AGI to be created
I think a crux here is that I expect sufficiently superhuman AGI to be able to easily manipulate humans without detection, so I don’t get much comfort from arguments like “It can’t kill us all as long as we don’t give it access to a factory that does X.” All it needs to do is figure out that there’s a disgruntled employee at the factory and bribe/befriend/cajole them, for example, which is absolutely possible because humans already do this (albeit less effectively than I expect an AGI to be capable of).
Likewise it seems not that hard to devise plans a human will think are good on inspection but which are actually bad. One way to do this is to have many plans with subtle interactions that look innocuous. Another is to have a single plan that exploits human blindspots (eg some crucial detail is hidden in a lengthy appendix about the effect of the plan on the badger population of East Anglia). [Incidentally I’d highly recommend watching “Yes, Minister” for countless examples of humans doing this successfully, albeit in fiction.]
No, this is not a crux. I think I mostly agree with you. But think that we are talking about an AGI that needs time, which is something that is usually denied: “as soon as an AGI is created, we all die’. Once you put time into the equation, you allow other AGI to be created