“If the survival of the AGI is part of the utility function”
If. By default, it isn’t: https://www.lesswrong.com/posts/Z9K3enK5qPoteNBFz/confused-thoughts-on-ai-afterlife-seriously
“What if we start designing very powerful boxes?”
A very powerful box would be very useless. Either you leave enough of an opening for a human to be taught valuable information that only the ai knows, or you don’t and then it’s useless, but, if the ai can teach the human something useful, it can also persuade him to do something bad.
“If the survival of the AGI is part of the utility function”
If. By default, it isn’t: https://www.lesswrong.com/posts/Z9K3enK5qPoteNBFz/confused-thoughts-on-ai-afterlife-seriously “What if we start designing very powerful boxes?” A very powerful box would be very useless. Either you leave enough of an opening for a human to be taught valuable information that only the ai knows, or you don’t and then it’s useless, but, if the ai can teach the human something useful, it can also persuade him to do something bad.