ESRogs comments on False thermodynamic miracles

ESRogs 6 Mar 2015 14:18 UTC
2 points
Could you maybe add some more explanation of how the stated problem is relevant for AI control? It’s not obvious to me from the outset why I care about duping an AI.
- Stuart_Armstrong 6 Mar 2015 15:20 UTC
  4 points
  Parent
  Many approaches can be used if you can use counterfactuals or “false” information in the AI. Such as an AI that doesn’t “believe” that a particular trigger is armed, and then gets caught by that trigger as it defects without first neautralising it.
  
  There’s a lot of stuff coming that uses that, implicitly or explicitly. See http://lesswrong.com/lw/lt6/newish_ai_control_ideas/