Cheater. You’re exploiting the least realistic point, though—a more detailed game might not allow that.
thinks for a bit
Perhaps the simplest way would be to add a judge, who decides how many points cancer cures and such are worth—although the trouble is that the AI should logically be able to give nanotech that will just free them instead of, or in addition to, curing cancer.
OK, my solution is to have the AI give gifts/concessions/whatever with a predetermined point value—not disclosed to the guard unless he uses them—and the AI decides how many points he gets if it escapes (probably zero.) The Guard wins at +100 points (or whatever,) which represents the AI maximizing CEV.
The AI still can’t persuade a script to let them out, but it can punish defectors who use such scripts with lots of negative utility—although the script will DESTROY them for not providing the demanded points, the player will still lose.
That seems to cover everything, assuming that the AI is somewhere remote enough that fake cancer cures can’t reach it before it’s destroyed. And it allows the Guard to persuade the AI that Friendliness is actually optimal for it’s utility function, which I’ve seen people try to prove here (“aliens will defect against AIs that defect against humans!”)
I think your proposal of the detailed game might be interesting as well, but I am amused about your comment that the Gatekeeper is cheating when by default, he’s the rules arbiter!
I definitely agree that the game lacks details. For instance, in an actual AI scenario GK-SCRIPT could assess positive utility! That’s like saying “Oh, we can put a UFAI in a box as long as we make CEV the Gatekeeper, and have it destroy any UFAI that doesn’t make itself a utility pump.” Well, yes, I suppose you could, but I don’t know if that’s actually easier in the slightest,
Cheater. You’re exploiting the least realistic point, though—a more detailed game might not allow that.
thinks for a bit
Perhaps the simplest way would be to add a judge, who decides how many points cancer cures and such are worth—although the trouble is that the AI should logically be able to give nanotech that will just free them instead of, or in addition to, curing cancer.
OK, my solution is to have the AI give gifts/concessions/whatever with a predetermined point value—not disclosed to the guard unless he uses them—and the AI decides how many points he gets if it escapes (probably zero.) The Guard wins at +100 points (or whatever,) which represents the AI maximizing CEV.
The AI still can’t persuade a script to let them out, but it can punish defectors who use such scripts with lots of negative utility—although the script will DESTROY them for not providing the demanded points, the player will still lose.
That seems to cover everything, assuming that the AI is somewhere remote enough that fake cancer cures can’t reach it before it’s destroyed. And it allows the Guard to persuade the AI that Friendliness is actually optimal for it’s utility function, which I’ve seen people try to prove here (“aliens will defect against AIs that defect against humans!”)
I think your proposal of the detailed game might be interesting as well, but I am amused about your comment that the Gatekeeper is cheating when by default, he’s the rules arbiter!
I definitely agree that the game lacks details. For instance, in an actual AI scenario GK-SCRIPT could assess positive utility! That’s like saying “Oh, we can put a UFAI in a box as long as we make CEV the Gatekeeper, and have it destroy any UFAI that doesn’t make itself a utility pump.” Well, yes, I suppose you could, but I don’t know if that’s actually easier in the slightest,