All good security measures. The key feature seems to be that they are progressively better approximations of not having an unsafe AI with a gatekeeper and an IRC channel in the first place!
Entirely fair point.
I suppose this might come up because there are no stated benefits which you are trying to receive. No Gatekeeper seemingly gets to pump her fists and say “I got the AI to give us the cure for cancer and the winning lottery numbers and a free energy source and STILL didn’t let them out of the box. Beat that Michaelos, with your AI DESTROYED script!” (Or maybe I just miss the gatekeepers who have done this?)
Which makes sense, because my understanding is that was never the point of the original proposed experiment, which seems to be a variant of “Super Intelligent Spammers can be convincing. I’ve proved this by being convincing while less powerful.” (I use spammers as an example because both are manipulators who can only send you text messages and who rewarding requires your voluntary cooperation, while none of their offers are real, which is quite similar to normal email spam and also the original experiment as far as I can tell.)
So if the AI is defined that ALL attempts at communication from it are spam and attempts to manipulate you, so the correct approach is treating it like a blacklisted spam address. If every e-mail from virusspam.com is an attempt to hack you, a good spamfilter could just dump everything from virusspam.com into spam and never even bother to bring it to your inbox.
If the idea is we are attempting to distinguish real AI’s from fake AI’s, then a more isomorphic scenario would probably be a game of Mafia set up as follows.
You are a town person. It is your turn to lynch. There are two other players. One is Mafia(UFAI), one is another town person(FAI). If you lynch Mafia, you and the other town person both win. If you lynch the other town person, the Mafia kills you in your sleep, and you lose. If you do anything else (Vote to lynch noone, vote to lynch yourself) You lose.
You may communicate textually with either of the other two players for as long as you want before making a decision. Win.
That seems a lot closer to the scenario that people want to be considering. I also note there doesn’t appear to be any distinguishing factor that allows you to win better than chance odds, but I haven’t actually played a lot of Mafia before, so I may just be unfamiliar with the strategies involved.
I also note there doesn’t appear to be any distinguishing factor that allows you to win better than chance odds, but I haven’t actually played a lot of Mafia before, so I may just be unfamiliar with the strategies involved.
Well, it’s usually played in person, and humans (usually) aren’t perfect liars.
Your proposed game has one flaw—there is an FAI and they want to help you win. It might be closer to have only two players, and the AI flips a coin to decide if it’s friendly—but then they would win if they let it out, with 50⁄50 odds, which seems unrealistic.
Perhaps the AI decides, in character, after being released, whether to be Friendly towards the human? Then the Gatekeeper could try to persuade the AI that Friendliness is optimal for their goals. The temptation might help as well, of course.
Entirely fair point.
I suppose this might come up because there are no stated benefits which you are trying to receive. No Gatekeeper seemingly gets to pump her fists and say “I got the AI to give us the cure for cancer and the winning lottery numbers and a free energy source and STILL didn’t let them out of the box. Beat that Michaelos, with your AI DESTROYED script!” (Or maybe I just miss the gatekeepers who have done this?)
Which makes sense, because my understanding is that was never the point of the original proposed experiment, which seems to be a variant of “Super Intelligent Spammers can be convincing. I’ve proved this by being convincing while less powerful.” (I use spammers as an example because both are manipulators who can only send you text messages and who rewarding requires your voluntary cooperation, while none of their offers are real, which is quite similar to normal email spam and also the original experiment as far as I can tell.)
So if the AI is defined that ALL attempts at communication from it are spam and attempts to manipulate you, so the correct approach is treating it like a blacklisted spam address. If every e-mail from virusspam.com is an attempt to hack you, a good spamfilter could just dump everything from virusspam.com into spam and never even bother to bring it to your inbox.
If the idea is we are attempting to distinguish real AI’s from fake AI’s, then a more isomorphic scenario would probably be a game of Mafia set up as follows.
http://en.wikipedia.org/wiki/Mafia_%28party_game%29
You are a town person. It is your turn to lynch. There are two other players. One is Mafia(UFAI), one is another town person(FAI). If you lynch Mafia, you and the other town person both win. If you lynch the other town person, the Mafia kills you in your sleep, and you lose. If you do anything else (Vote to lynch noone, vote to lynch yourself) You lose.
You may communicate textually with either of the other two players for as long as you want before making a decision. Win.
That seems a lot closer to the scenario that people want to be considering. I also note there doesn’t appear to be any distinguishing factor that allows you to win better than chance odds, but I haven’t actually played a lot of Mafia before, so I may just be unfamiliar with the strategies involved.
Well, it’s usually played in person, and humans (usually) aren’t perfect liars.
Your proposed game has one flaw—there is an FAI and they want to help you win. It might be closer to have only two players, and the AI flips a coin to decide if it’s friendly—but then they would win if they let it out, with 50⁄50 odds, which seems unrealistic.
Perhaps the AI decides, in character, after being released, whether to be Friendly towards the human? Then the Gatekeeper could try to persuade the AI that Friendliness is optimal for their goals. The temptation might help as well, of course.
I tried coming up with a more isomorphic game in another reply to you. Let me know if you think it models the situation better.