Scott Alexander comments on To signal effectively, use a non-human, non-stoppable enforcer

Scott Alexander 25 May 2010 22:18 UTC
17 points
If a normal mugger holds up a gun and says “Give me money or I’ll shoot you”, we consider the alternate hypotheses that the mugger will only shoot you if you do give er the money, or that the mugger will give you millions of dollars to reward your bravery if you refuse. But the mugger’s word itself, and our theory of mind on the things that tend to motivate muggers, make both of these much less likely than the garden-variety hypothesis that the mugger will shoot you if you don’t give the money. Further, this holds true whether the mugger claims er weapon is a gun, a ray gun, or a black hole generator; the credibility that the mugger can pull off er threat decreases if e says e has a black hole generator, but not the general skew in favor of worse results for not giving the money.

Why does that skew go away if the mugger claims to be holding an unfriendly AI or the threat of divine judgment some other Pascal-level weapon?

Your argument only seems to hold if there is no mugger and we’re considering abstract principles—ie maybe I should clap my hands on the tiny chance that it might set into effect a chain reaction that will save 3^^^3 lives. In those cases, I agree with you; but as soon as a mugger gets into the picture e provides more information and skews the utilities in favor of one action.