Better method, set up a script that responds to any and all text with “AI DESTROYED” if you have to wait for the person to start typing, they may try to bore you into opening your eyes wondering why the experiment hasn’t started yet, and you might accidentally read something.
All good security measures. The key feature seems to be that they are progressively better approximations of not having an unsafe AI with a gatekeeper and an IRC channel in the first place!
Well yes, if you stick the AI in a safe, cut all network cables, and throw away the key and combination, it probably wouldn’t be able to get out. But it wouldn’t be very useful either.
The entire point of these thought experiments is that a sufficiently useful and smart AI (i.e. the kind of AI that we want to make) will eventually find a way to at least be able to communicate with someone that has the authority to allow it to interact with the outside world. I think that if you really think about it, there are few scenarios where this is not possible. I certainly can’t think of any useful application of SAI that is also 100% effective at keeping it inside its box.
A good present-day analogy is computer security. Time and time again it has been proven that there is no simple silver bullet solution to the problem of balancing functionality and security—it requires expertise, constant maintenance, rigorous protocols, etc. And yet, hackers still manage to get through a lot of the time. It took a very long time for computer security to mature to the point where it is today where we can build reasonably (still not 100% of course), secure systems, and we’re just battling regular humans with a grudge—nothing even close to the threat a SAI would present.
Indeed. In fact, most of the solutions I’ve seen mentioned lately are all of one trend that edges closer and closer towards:
“Build a completely unsafe and suspicious AI, put it on a disconnected small computer with a bunch of nanites for self-modification and a large power reserve, with so many walls and physical barriers that it is impossible for the AI to get through with the amount of energy it could generate if it turned half of its materials into antimatter, and then put no input or output channels there of any kind, just have a completely useless multi-trillion-dollar marvel of science and engineering sitting in the practical equivalent of a black hole.”
All good security measures. The key feature seems to be that they are progressively better approximations of not having an unsafe AI with a gatekeeper and an IRC channel in the first place!
Entirely fair point.
I suppose this might come up because there are no stated benefits which you are trying to receive. No Gatekeeper seemingly gets to pump her fists and say “I got the AI to give us the cure for cancer and the winning lottery numbers and a free energy source and STILL didn’t let them out of the box. Beat that Michaelos, with your AI DESTROYED script!” (Or maybe I just miss the gatekeepers who have done this?)
Which makes sense, because my understanding is that was never the point of the original proposed experiment, which seems to be a variant of “Super Intelligent Spammers can be convincing. I’ve proved this by being convincing while less powerful.” (I use spammers as an example because both are manipulators who can only send you text messages and who rewarding requires your voluntary cooperation, while none of their offers are real, which is quite similar to normal email spam and also the original experiment as far as I can tell.)
So if the AI is defined that ALL attempts at communication from it are spam and attempts to manipulate you, so the correct approach is treating it like a blacklisted spam address. If every e-mail from virusspam.com is an attempt to hack you, a good spamfilter could just dump everything from virusspam.com into spam and never even bother to bring it to your inbox.
If the idea is we are attempting to distinguish real AI’s from fake AI’s, then a more isomorphic scenario would probably be a game of Mafia set up as follows.
You are a town person. It is your turn to lynch. There are two other players. One is Mafia(UFAI), one is another town person(FAI). If you lynch Mafia, you and the other town person both win. If you lynch the other town person, the Mafia kills you in your sleep, and you lose. If you do anything else (Vote to lynch noone, vote to lynch yourself) You lose.
You may communicate textually with either of the other two players for as long as you want before making a decision. Win.
That seems a lot closer to the scenario that people want to be considering. I also note there doesn’t appear to be any distinguishing factor that allows you to win better than chance odds, but I haven’t actually played a lot of Mafia before, so I may just be unfamiliar with the strategies involved.
I also note there doesn’t appear to be any distinguishing factor that allows you to win better than chance odds, but I haven’t actually played a lot of Mafia before, so I may just be unfamiliar with the strategies involved.
Well, it’s usually played in person, and humans (usually) aren’t perfect liars.
Your proposed game has one flaw—there is an FAI and they want to help you win. It might be closer to have only two players, and the AI flips a coin to decide if it’s friendly—but then they would win if they let it out, with 50⁄50 odds, which seems unrealistic.
Perhaps the AI decides, in character, after being released, whether to be Friendly towards the human? Then the Gatekeeper could try to persuade the AI that Friendliness is optimal for their goals. The temptation might help as well, of course.
All good security measures. The key feature seems to be that they are progressively better approximations of not having an unsafe AI with a gatekeeper and an IRC channel in the first place!
Well yes, if you stick the AI in a safe, cut all network cables, and throw away the key and combination, it probably wouldn’t be able to get out. But it wouldn’t be very useful either.
The entire point of these thought experiments is that a sufficiently useful and smart AI (i.e. the kind of AI that we want to make) will eventually find a way to at least be able to communicate with someone that has the authority to allow it to interact with the outside world. I think that if you really think about it, there are few scenarios where this is not possible. I certainly can’t think of any useful application of SAI that is also 100% effective at keeping it inside its box.
A good present-day analogy is computer security. Time and time again it has been proven that there is no simple silver bullet solution to the problem of balancing functionality and security—it requires expertise, constant maintenance, rigorous protocols, etc. And yet, hackers still manage to get through a lot of the time. It took a very long time for computer security to mature to the point where it is today where we can build reasonably (still not 100% of course), secure systems, and we’re just battling regular humans with a grudge—nothing even close to the threat a SAI would present.
Indeed. In fact, most of the solutions I’ve seen mentioned lately are all of one trend that edges closer and closer towards:
“Build a completely unsafe and suspicious AI, put it on a disconnected small computer with a bunch of nanites for self-modification and a large power reserve, with so many walls and physical barriers that it is impossible for the AI to get through with the amount of energy it could generate if it turned half of its materials into antimatter, and then put no input or output channels there of any kind, just have a completely useless multi-trillion-dollar marvel of science and engineering sitting in the practical equivalent of a black hole.”
What if the AI uses the walls as fuel? Better to just keep it stuck on your server farm ;)
Entirely fair point.
I suppose this might come up because there are no stated benefits which you are trying to receive. No Gatekeeper seemingly gets to pump her fists and say “I got the AI to give us the cure for cancer and the winning lottery numbers and a free energy source and STILL didn’t let them out of the box. Beat that Michaelos, with your AI DESTROYED script!” (Or maybe I just miss the gatekeepers who have done this?)
Which makes sense, because my understanding is that was never the point of the original proposed experiment, which seems to be a variant of “Super Intelligent Spammers can be convincing. I’ve proved this by being convincing while less powerful.” (I use spammers as an example because both are manipulators who can only send you text messages and who rewarding requires your voluntary cooperation, while none of their offers are real, which is quite similar to normal email spam and also the original experiment as far as I can tell.)
So if the AI is defined that ALL attempts at communication from it are spam and attempts to manipulate you, so the correct approach is treating it like a blacklisted spam address. If every e-mail from virusspam.com is an attempt to hack you, a good spamfilter could just dump everything from virusspam.com into spam and never even bother to bring it to your inbox.
If the idea is we are attempting to distinguish real AI’s from fake AI’s, then a more isomorphic scenario would probably be a game of Mafia set up as follows.
http://en.wikipedia.org/wiki/Mafia_%28party_game%29
You are a town person. It is your turn to lynch. There are two other players. One is Mafia(UFAI), one is another town person(FAI). If you lynch Mafia, you and the other town person both win. If you lynch the other town person, the Mafia kills you in your sleep, and you lose. If you do anything else (Vote to lynch noone, vote to lynch yourself) You lose.
You may communicate textually with either of the other two players for as long as you want before making a decision. Win.
That seems a lot closer to the scenario that people want to be considering. I also note there doesn’t appear to be any distinguishing factor that allows you to win better than chance odds, but I haven’t actually played a lot of Mafia before, so I may just be unfamiliar with the strategies involved.
Well, it’s usually played in person, and humans (usually) aren’t perfect liars.
Your proposed game has one flaw—there is an FAI and they want to help you win. It might be closer to have only two players, and the AI flips a coin to decide if it’s friendly—but then they would win if they let it out, with 50⁄50 odds, which seems unrealistic.
Perhaps the AI decides, in character, after being released, whether to be Friendly towards the human? Then the Gatekeeper could try to persuade the AI that Friendliness is optimal for their goals. The temptation might help as well, of course.
I tried coming up with a more isomorphic game in another reply to you. Let me know if you think it models the situation better.