Jack comments on What can you do with an Unfriendly AI?

Jack 21 Dec 2010 1:18 UTC
0 points
As I understand it this method is designed to work for constraint satisfaction problems -where we can easily detect false positives. You’re right that a possibility is that all the genies that can’t find solutions go on strike just to make us check all the yes’s (which would make this process no better than a brute force search, right?), maybe there needs to be a second punishment that is worse than death to give them an incentive not to lie.
- paulfchristiano 21 Dec 2010 1:46 UTC
  4 points
  Parent
  A genie who can’t find a solution has literally no agency. There is nothing he can say to the filter which will cause it to say “yes,” because the filter itself checks to see if the genie has given a proof. If the genie can’t find a proof, the filter will always say “no.” I don’t quite know what going on strike would entail, but certainly if all of the genies who can’t find solutions collectively have 0 influence in the world, we don’t care if they strike.
  - Jack 21 Dec 2010 1:52 UTC
    0 points
    Parent
    Okay, that makes sense. What about computation time limits? A genie that knows it can’t give an answer would wait as long as possible before saying anything.
    - paulfchristiano 21 Dec 2010 1:54 UTC
      1 point
      Parent
      I mention timing in the post; the AI gets some fixed interval, at the end of which the filter outputs whether or not they have a proof. If you can’t change what the filter says, then you don’t get to affect the world.