The chance of an AI torturing humans as a means to some other goal does seem low, but what about the AI torturing humans as a end in itself? I think CEV could result in this with non-negligible probability (>0.000001). I wouldn’t be surprised if the typical LessWrong poster has very different morality than the majority of the population, so our intuition of the results of CEV could be very wrong.
Note that it does not suffice for us to have different conscious morality or different verbal statements of values. That only matters if the difference remains under extrapolation, eg, of what others would want if they knew there weren’t any deities.
The chance of an AI torturing humans as a means to some other goal does seem low, but what about the AI torturing humans as a end in itself? I think CEV could result in this with non-negligible probability (>0.000001). I wouldn’t be surprised if the typical LessWrong poster has very different morality than the majority of the population, so our intuition of the results of CEV could be very wrong.
Note that it does not suffice for us to have different conscious morality or different verbal statements of values. That only matters if the difference remains under extrapolation, eg, of what others would want if they knew there weren’t any deities.