Stuart_Armstrong comments on False thermodynamic miracles

Stuart_Armstrong 6 Mar 2015 15:20 UTC
4 points
Many approaches can be used if you can use counterfactuals or “false” information in the AI. Such as an AI that doesn’t “believe” that a particular trigger is armed, and then gets caught by that trigger as it defects without first neautralising it.

There’s a lot of stuff coming that uses that, implicitly or explicitly. See http://lesswrong.com/lw/lt6/newish_ai_control_ideas/