As I understand it this method is designed to work for constraint satisfaction problems -where we can easily detect false positives. You’re right that a possibility is that all the genies that can’t find solutions go on strike just to make us check all the yes’s (which would make this process no better than a brute force search, right?), maybe there needs to be a second punishment that is worse than death to give them an incentive not to lie.
A genie who can’t find a solution has literally no agency. There is nothing he can say to the filter which will cause it to say “yes,” because the filter itself checks to see if the genie has given a proof. If the genie can’t find a proof, the filter will always say “no.” I don’t quite know what going on strike would entail, but certainly if all of the genies who can’t find solutions collectively have 0 influence in the world, we don’t care if they strike.
Okay, that makes sense. What about computation time limits? A genie that knows it can’t give an answer would wait as long as possible before saying anything.
I mention timing in the post; the AI gets some fixed interval, at the end of which the filter outputs whether or not they have a proof. If you can’t change what the filter says, then you don’t get to affect the world.
As I understand it this method is designed to work for constraint satisfaction problems -where we can easily detect false positives. You’re right that a possibility is that all the genies that can’t find solutions go on strike just to make us check all the yes’s (which would make this process no better than a brute force search, right?), maybe there needs to be a second punishment that is worse than death to give them an incentive not to lie.
A genie who can’t find a solution has literally no agency. There is nothing he can say to the filter which will cause it to say “yes,” because the filter itself checks to see if the genie has given a proof. If the genie can’t find a proof, the filter will always say “no.” I don’t quite know what going on strike would entail, but certainly if all of the genies who can’t find solutions collectively have 0 influence in the world, we don’t care if they strike.
Okay, that makes sense. What about computation time limits? A genie that knows it can’t give an answer would wait as long as possible before saying anything.
I mention timing in the post; the AI gets some fixed interval, at the end of which the filter outputs whether or not they have a proof. If you can’t change what the filter says, then you don’t get to affect the world.