I’d be more interested in a response to the substance of my comment: If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
In other words, how do you take into account the fact that killing them might bring about exactly the fate that you intend to prevent; whereas one more exchange of rational argument might convince them not to do it?
If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
If someone with a facemask is pointing a gun at you he might just want to present it and ask you if you want to buy it, the facemask being the newest fashion hit that you are simply unaware of.
Edit: Disregard what I’ve wrote below. It isn’t relevant since it assumes that the individual hasn’t tried to make a Friendly AI which seems to be against the assumption in the hypothetical.
I’d be more interested in a response to the substance of my comment: If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
There seems to be a heavy overlap between people who think AGI will foom and people who are concerned about Friendliness (for somewhat obvious reasons. Friendliness matters a lot more if fooming is plausible). It seems extremely unlikely that someone would set up a dead man’s switch unless they thought that a lot would actually get accomplished by the AI, i.e. that it would likely foom in a Friendly fashion. The actual chance that any such switches have been put into place seems low.
But what if Eliezer thinks he’s got an FAI he can turn on, and Joe isn’t convinced that it’s actually as Friendly as Eliezer thinks it is? I’d rather Joe argue with Eliezer than shoot him.
I am somewhat reluctant to engage deeply on the specific counterfactual here. Disagreeing with some of the more absurd statements by AndrewHickey has already placed me in the position of delivering enemy soldiers. That is an undesirable position to be in when the subject is one that encourages people to turn off their brains and start thinking with their emotional reflexes. Disagreeing with terrible arguments is not the same as supporting the opposition—but you an still expect the same treatment!
I would have to engage rather a lot of creative thinking to construct a scenario where I would personally take any drastic measures. Apart from the ethical injunctions I’ve previously mentioned I don’t consider myself qualified to make the decision. The most I would do is make sure the situation has been brought to the attention of the relevant spooks and make sure competent AI researchers are informed so that they can give any necessary advice to the spook-analysts. Even then the spook agency would probably not need to resort to violence. If they do, in fact, have to resort to violence because the AGI creators force the issue then the creators in question definitely cannot be trusted!
If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
Now, with the aforementioned caveats, let us begin. I shall first note then assume away all the options that are available for circumventing dead man’s switches. I refer here to resources the CIA could get their hands on. That means bunker buster bombs and teams of top of the line hackers to track down online instances. But those measures are not completely reliable so I’ll take it for granted that the DMS works.
We now have a situation where terrorists are holding the world hostage. Ineffectively. Either they’ll destroy the world or, if you kill them, they’ll destroy the world. So it doesn’t matter too much what you—you’re dead either way. It seems the appropriate response is to blow the terrorists up. I’m not sure if I always advocate “don’t negotiate with terrorists” but I definitely advocate “don’t negotiate with terrorists when they are going to do the worst case thing anyway”!
But that is still too easy. Let’s go to the next case. We’ll say that the current design has a 99.9% chance of producing an uFAI. But if we give the AI creators another month to finish their work their creation has a 1% chance of creating an FAI[1]. Now the DMS threat actually matters. There is something to lose. The question becomes how do you deal with terrorists in a once-off, all-in situation. What do you do when (a small percentage but all that is available of) everything is at stake and someone can present a credible threat?
I actually don’t know the answer. I am not sure there is a well established. Being the kind of group that doesn’t take the terrorists out with a missile barrage has all sorts of problems. But being the person who does blow them away has a rather obvious problem too. I recall Vladimir making a interesting post regarding blackmail and terrorism however I don’t think it gave us a how to guide kind of resolution.
[1] Also assume that you expect another source to create an FAI with 50% chance a few years later if the current creators are stopped.
Yep. Now keep in mind that the CIA, or whatever other agency you care to bring to bear, is staffed with humans — fallible humans, the same sorts of agents who can be brought in remarkable numbers to defend a religion. The same sorts of agents who have at least once¹, and possibly twice², come within a single human decision of destroying the world for reasons that were later better classified as mistakes, or narrowly-averted disasters.
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right? And what is the chance that they have misconceived the situation such that by pulling the trigger, they will create an even worse situation?
My point isn’t some sort of hippy-dippy pacifism. My point is: Humans — all of us; you, me, the CIA — are running on corrupted hardware. At some point when we make a severe decision, one that goes against some well-learned rules such as not-killing, we have to take into account that almost everyone who’s ever been in that situation has been making a bad decision.
¹ Stanislav Petrov; 26 September 1983 ² Jack Kennedy; Cuban Missile Crisis, October 1962
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right?
Fairly high. This is a far simpler situation than dealing with foreign powers. Raiding the research centre to investigate is a straightforward task. While they are in no place to evaluate friendliness themselves they are certainly capable of working out whether there is AI code that is about to be run—either by looking around or interrogating. Bear in mind that if it comes down to “do we need to shoot them?” the researchers must be resisting them and trying to run the doomsday code despite the intervention. That is a big deal.
And what is the chance that they have misconceived the situation such that by pulling the trigger, they will create an even worse situation?
Negligible.
The problem here is if other researchers or well meaning nutcases take it upon themselves to do some casual killing. An intelligence agency looking after the national interests—the same way it always does—is not a problem.
This is not some magical special case where there is some deep ethical reason that threat cannot be assessed. It is just another day at the office for the spooks and there is less cause for bias than usual—all the foreign politics gets out of the way.
I’d be more interested in a response to the substance of my comment: If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
In other words, how do you take into account the fact that killing them might bring about exactly the fate that you intend to prevent; whereas one more exchange of rational argument might convince them not to do it?
If someone with a facemask is pointing a gun at you he might just want to present it and ask you if you want to buy it, the facemask being the newest fashion hit that you are simply unaware of.
Edit: Disregard what I’ve wrote below. It isn’t relevant since it assumes that the individual hasn’t tried to make a Friendly AI which seems to be against the assumption in the hypothetical.
There seems to be a heavy overlap between people who think AGI will foom and people who are concerned about Friendliness (for somewhat obvious reasons. Friendliness matters a lot more if fooming is plausible). It seems extremely unlikely that someone would set up a dead man’s switch unless they thought that a lot would actually get accomplished by the AI, i.e. that it would likely foom in a Friendly fashion. The actual chance that any such switches have been put into place seems low.
Oh, sure, I agree.
But what if Eliezer thinks he’s got an FAI he can turn on, and Joe isn’t convinced that it’s actually as Friendly as Eliezer thinks it is? I’d rather Joe argue with Eliezer than shoot him.
I am somewhat reluctant to engage deeply on the specific counterfactual here. Disagreeing with some of the more absurd statements by AndrewHickey has already placed me in the position of delivering enemy soldiers. That is an undesirable position to be in when the subject is one that encourages people to turn off their brains and start thinking with their emotional reflexes. Disagreeing with terrible arguments is not the same as supporting the opposition—but you an still expect the same treatment!
I would have to engage rather a lot of creative thinking to construct a scenario where I would personally take any drastic measures. Apart from the ethical injunctions I’ve previously mentioned I don’t consider myself qualified to make the decision. The most I would do is make sure the situation has been brought to the attention of the relevant spooks and make sure competent AI researchers are informed so that they can give any necessary advice to the spook-analysts. Even then the spook agency would probably not need to resort to violence. If they do, in fact, have to resort to violence because the AGI creators force the issue then the creators in question definitely cannot be trusted!
Now, with the aforementioned caveats, let us begin. I shall first note then assume away all the options that are available for circumventing dead man’s switches. I refer here to resources the CIA could get their hands on. That means bunker buster bombs and teams of top of the line hackers to track down online instances. But those measures are not completely reliable so I’ll take it for granted that the DMS works.
We now have a situation where terrorists are holding the world hostage. Ineffectively. Either they’ll destroy the world or, if you kill them, they’ll destroy the world. So it doesn’t matter too much what you—you’re dead either way. It seems the appropriate response is to blow the terrorists up. I’m not sure if I always advocate “don’t negotiate with terrorists” but I definitely advocate “don’t negotiate with terrorists when they are going to do the worst case thing anyway”!
But that is still too easy. Let’s go to the next case. We’ll say that the current design has a 99.9% chance of producing an uFAI. But if we give the AI creators another month to finish their work their creation has a 1% chance of creating an FAI[1]. Now the DMS threat actually matters. There is something to lose. The question becomes how do you deal with terrorists in a once-off, all-in situation. What do you do when (a small percentage but all that is available of) everything is at stake and someone can present a credible threat?
I actually don’t know the answer. I am not sure there is a well established. Being the kind of group that doesn’t take the terrorists out with a missile barrage has all sorts of problems. But being the person who does blow them away has a rather obvious problem too. I recall Vladimir making a interesting post regarding blackmail and terrorism however I don’t think it gave us a how to guide kind of resolution.
[1] Also assume that you expect another source to create an FAI with 50% chance a few years later if the current creators are stopped.
Yep. Now keep in mind that the CIA, or whatever other agency you care to bring to bear, is staffed with humans — fallible humans, the same sorts of agents who can be brought in remarkable numbers to defend a religion. The same sorts of agents who have at least once¹, and possibly twice², come within a single human decision of destroying the world for reasons that were later better classified as mistakes, or narrowly-averted disasters.
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right? And what is the chance that they have misconceived the situation such that by pulling the trigger, they will create an even worse situation?
My point isn’t some sort of hippy-dippy pacifism. My point is: Humans — all of us; you, me, the CIA — are running on corrupted hardware. At some point when we make a severe decision, one that goes against some well-learned rules such as not-killing, we have to take into account that almost everyone who’s ever been in that situation has been making a bad decision.
¹ Stanislav Petrov; 26 September 1983
² Jack Kennedy; Cuban Missile Crisis, October 1962
Fairly high. This is a far simpler situation than dealing with foreign powers. Raiding the research centre to investigate is a straightforward task. While they are in no place to evaluate friendliness themselves they are certainly capable of working out whether there is AI code that is about to be run—either by looking around or interrogating. Bear in mind that if it comes down to “do we need to shoot them?” the researchers must be resisting them and trying to run the doomsday code despite the intervention. That is a big deal.
Negligible.
The problem here is if other researchers or well meaning nutcases take it upon themselves to do some casual killing. An intelligence agency looking after the national interests—the same way it always does—is not a problem.
This is not some magical special case where there is some deep ethical reason that threat cannot be assessed. It is just another day at the office for the spooks and there is less cause for bias than usual—all the foreign politics gets out of the way.