TheMajor comments on High impact from low impact

TheMajor 17 Apr 2015 20:47 UTC
1 point
Congratulations! You have just outsmarted an AI, sneakily allowing it to have great impact where it desires to not have impact at all.

Edited to add: the above was sarcastic. Surely an AI would realise that it is possible you are trying tricks like this, and still output the wrong coordinate if this possibility is high enough.
- Stuart_Armstrong 20 Apr 2015 5:29 UTC
  0 points
  Parent
  http://lesswrong.com/r/discussion/lw/lyh/utility_vs_probability_idea_synthesis/ and http://lesswrong.com/lw/ltf/false_thermodynamic_miracles/
  - TheMajor 20 Apr 2015 6:37 UTC
    0 points
    Parent
    I read those two, but I don’t see what this idea contributes to AI control on top of those ideas. If you can get the AI to act like it believes what you want it to, in spite of evidence, then there’s no need to try the tricks with two coordinates. Conversely, if you cannot then you won’t fool it either with telling it that there’s a second coordinate involved. Why is it useful to control an AI through this splitting of information, if we already have the false miracles? Or in case the miracles fail, how do you prevent an AI from seeing right through this scheme? I think that in the latter case you are trying nothing more than to outsmart an AI here...
    - Stuart_Armstrong 20 Apr 2015 10:52 UTC
      2 points
      Parent
      
      to act like it believes what you want it to, in spite of evidence
      
      The approach I’m trying to get is to be able to make the AI do stuff without having to define hard concepts. “Deflect the meteor but without having undue impact on the world” is a hard concept.
      
      “reduced impact” seems easier, and “false belief” is much easier. It seems we can combine the two in this way to get something we want without needing to define it.