My thoughts about this problem are kind of a mess right now, but I feel there’s more than meets the eye.
Ignore the torture, “possible torture” and all that. It’s all a red herring. The real issue is lying, tricking humans into utility-increasing behaviors. It’s almost certain that some combination of “relative weights of good things” will make the FAI lie to humans. Maybe not the Bob+kitten scenario exactly, but something is bound to turn up. (Unless of course our CEV places a huge disutility on lies, which I’m pretty sure won’t be the case.) On the other hand, we humans quickly jump to distrusting anyone who has lied in the past, even if we know it’s for our own good. So now the FAI has huge incentive to conceal its lies, prevent the news from spreading among humans. I don’t have enough brainpower to model this scenario further, but it troubles me.
Lying is a form of manipulation, and humans don’t want/like to be manipulated. If the CEV works, then it will understand human concepts like “trust” and “lying” and hopefully avoid it. The only situations where it will intentionally manipulate people is when it is trying to do what is best for humanity. In these cases, you don’t have to worry because the CEV is smarter then you, but is still trying to do the “right thing” that you would do if you knew everything it knew.
Lying is a form of manipulation, and humans don’t want/like to be manipulated.
Well… that depends...
In these cases, you don’t have to worry because the CEV is smarter then you, but is still trying to do the “right thing” that you would do if you knew everything it knew.
My thoughts about this problem are kind of a mess right now, but I feel there’s more than meets the eye.
Ignore the torture, “possible torture” and all that. It’s all a red herring. The real issue is lying, tricking humans into utility-increasing behaviors. It’s almost certain that some combination of “relative weights of good things” will make the FAI lie to humans. Maybe not the Bob+kitten scenario exactly, but something is bound to turn up. (Unless of course our CEV places a huge disutility on lies, which I’m pretty sure won’t be the case.) On the other hand, we humans quickly jump to distrusting anyone who has lied in the past, even if we know it’s for our own good. So now the FAI has huge incentive to conceal its lies, prevent the news from spreading among humans. I don’t have enough brainpower to model this scenario further, but it troubles me.
Lying is a form of manipulation, and humans don’t want/like to be manipulated. If the CEV works, then it will understand human concepts like “trust” and “lying” and hopefully avoid it. The only situations where it will intentionally manipulate people is when it is trying to do what is best for humanity. In these cases, you don’t have to worry because the CEV is smarter then you, but is still trying to do the “right thing” that you would do if you knew everything it knew.
Well… that depends...
Exactly.