Into the lives of countless humans before you has come the thought, “I must kill this nonviolent person in order to save the world.” We have no evidence that those thoughts have ever been correct; and plenty of evidence that they have been incorrect.
Sure; but the CIA also classifies “leading a peaceful, democratic political uprising” as worthy of violence; so they’re not a very good guide.
More seriously: Today there are probably dozens or hundreds of processes going on that, if left unchecked, could lead to the destruction of the world and all that you and I value. Some of these are entirely mindless. I’m rather confident that somewhere in the solar system is an orbiting asteroid that will, if not deflected, eventually crash into the Earth and destroy all life as we know it. Everyone who is proceeding with their lives in ignorance of that fact is thereby participating in a process which, if unchecked, leads to the destruction of the world and all that is good. I hope that we agree that this belief does not justify killing people who oppose the funding of anti-asteroid defense.
But if you are seriously ready to kill someone who has her finger poised above the “on” switch of an unfriendly AGI (which is to say, an AGI that you believe is not sufficiently proven to be Friendly), then you are very likely susceptible to a rather trivial dead man’s switch. The uFAI creator merely needs to be sufficiently confident in their AI’s positive utility that they are willing to set it up to activate if they (the creator) are killed. Then, your readiness to kill is subverted. And ultimately, a person who is clever enough to create uFAI is clever enough to rig any number of nth-order dead man’s switches if they really think they are justified in doing so.
Which means, in the limit case, that you’re reduced to either (1) going on a massacre of everyone involved in AI, machine learning, or related fields; or (2) resorting to convincing people of your views and concerns rather than threatening them.
I’m rather confident that somewhere in the solar system is an orbiting asteroid that will, if not deflected, eventually crash into the Earth and destroy all life as we know it.
Huh? Downvoted for sloppy reasoning. This most likely won’t happen on the timescale where “life as we know it” continues to exist.
This most likely won’t happen on the timescale where “life as we know it” continues to exist.
The Chicxulub asteroid impact did wipe out almost all non-ocean life. That asteroid was 8-12 km. It is estimated that an impact of that size happens every few hundred million years. So this claim seems inaccurate. On the other hand, the WISE survey results strongly suggests that no severe asteroid impacts are likely in the next few hundred years.
It is estimated that an impact of that size happens every few hundred million years. So this claim seems inaccurate.
Only if you expect life as we know it to last in the order of a few hundred million years. That probability of that happening is too low for me to even put a number to it.
Would you mind posting your reasoning, instead of just posting your conclusions and an insult?
I should clarify that I was intending to set some sort of boundary condition on the possible futures of life on earth, rather than predicting a specific end to it: If life comes to no other end, at the very least, eventually we’ll get asteroided if we stay here. This by itself does not justify killing people in a fight for asteroid-prevention; so what would justify killing people?
Are we running into definitional issues of what we mean by “life as we know it?” That term has some degree of ambiguity that may be creating the problem.
Are we running into definitional issues of what we mean by “life as we know it?” That term has some degree of ambiguity that may be creating the problem.
Quite possibly. Although one of the features of ‘life as we know it’ that will not survive for hundreds of millions of years is living exclusively on earth. So the disagreement would remain independently of definition.
Sure; but the CIA also classifies “leading a peaceful, democratic political uprising” as worthy of violence; so they’re not a very good guide.
They are not a guide so much as the very organisation for whom this sort of consideration is most relevant. They (or another organisation like them) are the groups most likely to carry out preventative measures. It is more or less part of their job description. (And puts a whole new twist on ‘counter intelligence’!)
Which means, in the limit case, that you’re reduced to either (1) going on a massacre of everyone involved in AI, machine learning, or related fields; or (2) resorting to convincing people of your views and concerns rather than threatening them.
Those extremes do not strike me as a particularly natural place to set up a dichotomy. In the space between them are all sorts of proactive options.
I’d be more interested in a response to the substance of my comment: If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
In other words, how do you take into account the fact that killing them might bring about exactly the fate that you intend to prevent; whereas one more exchange of rational argument might convince them not to do it?
If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
If someone with a facemask is pointing a gun at you he might just want to present it and ask you if you want to buy it, the facemask being the newest fashion hit that you are simply unaware of.
Edit: Disregard what I’ve wrote below. It isn’t relevant since it assumes that the individual hasn’t tried to make a Friendly AI which seems to be against the assumption in the hypothetical.
I’d be more interested in a response to the substance of my comment: If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
There seems to be a heavy overlap between people who think AGI will foom and people who are concerned about Friendliness (for somewhat obvious reasons. Friendliness matters a lot more if fooming is plausible). It seems extremely unlikely that someone would set up a dead man’s switch unless they thought that a lot would actually get accomplished by the AI, i.e. that it would likely foom in a Friendly fashion. The actual chance that any such switches have been put into place seems low.
But what if Eliezer thinks he’s got an FAI he can turn on, and Joe isn’t convinced that it’s actually as Friendly as Eliezer thinks it is? I’d rather Joe argue with Eliezer than shoot him.
I am somewhat reluctant to engage deeply on the specific counterfactual here. Disagreeing with some of the more absurd statements by AndrewHickey has already placed me in the position of delivering enemy soldiers. That is an undesirable position to be in when the subject is one that encourages people to turn off their brains and start thinking with their emotional reflexes. Disagreeing with terrible arguments is not the same as supporting the opposition—but you an still expect the same treatment!
I would have to engage rather a lot of creative thinking to construct a scenario where I would personally take any drastic measures. Apart from the ethical injunctions I’ve previously mentioned I don’t consider myself qualified to make the decision. The most I would do is make sure the situation has been brought to the attention of the relevant spooks and make sure competent AI researchers are informed so that they can give any necessary advice to the spook-analysts. Even then the spook agency would probably not need to resort to violence. If they do, in fact, have to resort to violence because the AGI creators force the issue then the creators in question definitely cannot be trusted!
If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
Now, with the aforementioned caveats, let us begin. I shall first note then assume away all the options that are available for circumventing dead man’s switches. I refer here to resources the CIA could get their hands on. That means bunker buster bombs and teams of top of the line hackers to track down online instances. But those measures are not completely reliable so I’ll take it for granted that the DMS works.
We now have a situation where terrorists are holding the world hostage. Ineffectively. Either they’ll destroy the world or, if you kill them, they’ll destroy the world. So it doesn’t matter too much what you—you’re dead either way. It seems the appropriate response is to blow the terrorists up. I’m not sure if I always advocate “don’t negotiate with terrorists” but I definitely advocate “don’t negotiate with terrorists when they are going to do the worst case thing anyway”!
But that is still too easy. Let’s go to the next case. We’ll say that the current design has a 99.9% chance of producing an uFAI. But if we give the AI creators another month to finish their work their creation has a 1% chance of creating an FAI[1]. Now the DMS threat actually matters. There is something to lose. The question becomes how do you deal with terrorists in a once-off, all-in situation. What do you do when (a small percentage but all that is available of) everything is at stake and someone can present a credible threat?
I actually don’t know the answer. I am not sure there is a well established. Being the kind of group that doesn’t take the terrorists out with a missile barrage has all sorts of problems. But being the person who does blow them away has a rather obvious problem too. I recall Vladimir making a interesting post regarding blackmail and terrorism however I don’t think it gave us a how to guide kind of resolution.
[1] Also assume that you expect another source to create an FAI with 50% chance a few years later if the current creators are stopped.
Yep. Now keep in mind that the CIA, or whatever other agency you care to bring to bear, is staffed with humans — fallible humans, the same sorts of agents who can be brought in remarkable numbers to defend a religion. The same sorts of agents who have at least once¹, and possibly twice², come within a single human decision of destroying the world for reasons that were later better classified as mistakes, or narrowly-averted disasters.
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right? And what is the chance that they have misconceived the situation such that by pulling the trigger, they will create an even worse situation?
My point isn’t some sort of hippy-dippy pacifism. My point is: Humans — all of us; you, me, the CIA — are running on corrupted hardware. At some point when we make a severe decision, one that goes against some well-learned rules such as not-killing, we have to take into account that almost everyone who’s ever been in that situation has been making a bad decision.
¹ Stanislav Petrov; 26 September 1983 ² Jack Kennedy; Cuban Missile Crisis, October 1962
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right?
Fairly high. This is a far simpler situation than dealing with foreign powers. Raiding the research centre to investigate is a straightforward task. While they are in no place to evaluate friendliness themselves they are certainly capable of working out whether there is AI code that is about to be run—either by looking around or interrogating. Bear in mind that if it comes down to “do we need to shoot them?” the researchers must be resisting them and trying to run the doomsday code despite the intervention. That is a big deal.
And what is the chance that they have misconceived the situation such that by pulling the trigger, they will create an even worse situation?
Negligible.
The problem here is if other researchers or well meaning nutcases take it upon themselves to do some casual killing. An intelligence agency looking after the national interests—the same way it always does—is not a problem.
This is not some magical special case where there is some deep ethical reason that threat cannot be assessed. It is just another day at the office for the spooks and there is less cause for bias than usual—all the foreign politics gets out of the way.
Sure, but under what conditions can a human being reliably know that? You’re running on corrupted hardware, just as I am.
Into the lives of countless humans before you has come the thought, “I must kill this nonviolent person in order to save the world.” We have no evidence that those thoughts have ever been correct; and plenty of evidence that they have been incorrect.
You may wish to strengthen that claim somewhat. I doubt the CIA would classify ‘about to press the on switch of an unfriendly AGI’ as ‘nonviolent’.
You do make a good point about (actually rational constructions of) ethics.
Sure; but the CIA also classifies “leading a peaceful, democratic political uprising” as worthy of violence; so they’re not a very good guide.
More seriously: Today there are probably dozens or hundreds of processes going on that, if left unchecked, could lead to the destruction of the world and all that you and I value. Some of these are entirely mindless. I’m rather confident that somewhere in the solar system is an orbiting asteroid that will, if not deflected, eventually crash into the Earth and destroy all life as we know it. Everyone who is proceeding with their lives in ignorance of that fact is thereby participating in a process which, if unchecked, leads to the destruction of the world and all that is good. I hope that we agree that this belief does not justify killing people who oppose the funding of anti-asteroid defense.
But if you are seriously ready to kill someone who has her finger poised above the “on” switch of an unfriendly AGI (which is to say, an AGI that you believe is not sufficiently proven to be Friendly), then you are very likely susceptible to a rather trivial dead man’s switch. The uFAI creator merely needs to be sufficiently confident in their AI’s positive utility that they are willing to set it up to activate if they (the creator) are killed. Then, your readiness to kill is subverted. And ultimately, a person who is clever enough to create uFAI is clever enough to rig any number of nth-order dead man’s switches if they really think they are justified in doing so.
Which means, in the limit case, that you’re reduced to either (1) going on a massacre of everyone involved in AI, machine learning, or related fields; or (2) resorting to convincing people of your views and concerns rather than threatening them.
Huh? Downvoted for sloppy reasoning. This most likely won’t happen on the timescale where “life as we know it” continues to exist.
The Chicxulub asteroid impact did wipe out almost all non-ocean life. That asteroid was 8-12 km. It is estimated that an impact of that size happens every few hundred million years. So this claim seems inaccurate. On the other hand, the WISE survey results strongly suggests that no severe asteroid impacts are likely in the next few hundred years.
Only if you expect life as we know it to last in the order of a few hundred million years. That probability of that happening is too low for me to even put a number to it.
Would you mind posting your reasoning, instead of just posting your conclusions and an insult?
I should clarify that I was intending to set some sort of boundary condition on the possible futures of life on earth, rather than predicting a specific end to it: If life comes to no other end, at the very least, eventually we’ll get asteroided if we stay here. This by itself does not justify killing people in a fight for asteroid-prevention; so what would justify killing people?
Timescale of life as we know it continuing to exist: Short
Timescale of killer asteroids hitting earth: Long
Are we running into definitional issues of what we mean by “life as we know it?” That term has some degree of ambiguity that may be creating the problem.
Quite possibly. Although one of the features of ‘life as we know it’ that will not survive for hundreds of millions of years is living exclusively on earth. So the disagreement would remain independently of definition.
They are not a guide so much as the very organisation for whom this sort of consideration is most relevant. They (or another organisation like them) are the groups most likely to carry out preventative measures. It is more or less part of their job description. (And puts a whole new twist on ‘counter intelligence’!)
Those extremes do not strike me as a particularly natural place to set up a dichotomy. In the space between them are all sorts of proactive options.
I’d be more interested in a response to the substance of my comment: If you think that a person is about to turn on a (to your way of thinking) insufficiently Friendly AI, such that killing them might stop the inevitable paperclipping of all you hold dear, how do you take into account the fact that they might have outwitted you by setting up a dead man’s switch?
In other words, how do you take into account the fact that killing them might bring about exactly the fate that you intend to prevent; whereas one more exchange of rational argument might convince them not to do it?
If someone with a facemask is pointing a gun at you he might just want to present it and ask you if you want to buy it, the facemask being the newest fashion hit that you are simply unaware of.
Edit: Disregard what I’ve wrote below. It isn’t relevant since it assumes that the individual hasn’t tried to make a Friendly AI which seems to be against the assumption in the hypothetical.
There seems to be a heavy overlap between people who think AGI will foom and people who are concerned about Friendliness (for somewhat obvious reasons. Friendliness matters a lot more if fooming is plausible). It seems extremely unlikely that someone would set up a dead man’s switch unless they thought that a lot would actually get accomplished by the AI, i.e. that it would likely foom in a Friendly fashion. The actual chance that any such switches have been put into place seems low.
Oh, sure, I agree.
But what if Eliezer thinks he’s got an FAI he can turn on, and Joe isn’t convinced that it’s actually as Friendly as Eliezer thinks it is? I’d rather Joe argue with Eliezer than shoot him.
I am somewhat reluctant to engage deeply on the specific counterfactual here. Disagreeing with some of the more absurd statements by AndrewHickey has already placed me in the position of delivering enemy soldiers. That is an undesirable position to be in when the subject is one that encourages people to turn off their brains and start thinking with their emotional reflexes. Disagreeing with terrible arguments is not the same as supporting the opposition—but you an still expect the same treatment!
I would have to engage rather a lot of creative thinking to construct a scenario where I would personally take any drastic measures. Apart from the ethical injunctions I’ve previously mentioned I don’t consider myself qualified to make the decision. The most I would do is make sure the situation has been brought to the attention of the relevant spooks and make sure competent AI researchers are informed so that they can give any necessary advice to the spook-analysts. Even then the spook agency would probably not need to resort to violence. If they do, in fact, have to resort to violence because the AGI creators force the issue then the creators in question definitely cannot be trusted!
Now, with the aforementioned caveats, let us begin. I shall first note then assume away all the options that are available for circumventing dead man’s switches. I refer here to resources the CIA could get their hands on. That means bunker buster bombs and teams of top of the line hackers to track down online instances. But those measures are not completely reliable so I’ll take it for granted that the DMS works.
We now have a situation where terrorists are holding the world hostage. Ineffectively. Either they’ll destroy the world or, if you kill them, they’ll destroy the world. So it doesn’t matter too much what you—you’re dead either way. It seems the appropriate response is to blow the terrorists up. I’m not sure if I always advocate “don’t negotiate with terrorists” but I definitely advocate “don’t negotiate with terrorists when they are going to do the worst case thing anyway”!
But that is still too easy. Let’s go to the next case. We’ll say that the current design has a 99.9% chance of producing an uFAI. But if we give the AI creators another month to finish their work their creation has a 1% chance of creating an FAI[1]. Now the DMS threat actually matters. There is something to lose. The question becomes how do you deal with terrorists in a once-off, all-in situation. What do you do when (a small percentage but all that is available of) everything is at stake and someone can present a credible threat?
I actually don’t know the answer. I am not sure there is a well established. Being the kind of group that doesn’t take the terrorists out with a missile barrage has all sorts of problems. But being the person who does blow them away has a rather obvious problem too. I recall Vladimir making a interesting post regarding blackmail and terrorism however I don’t think it gave us a how to guide kind of resolution.
[1] Also assume that you expect another source to create an FAI with 50% chance a few years later if the current creators are stopped.
Yep. Now keep in mind that the CIA, or whatever other agency you care to bring to bear, is staffed with humans — fallible humans, the same sorts of agents who can be brought in remarkable numbers to defend a religion. The same sorts of agents who have at least once¹, and possibly twice², come within a single human decision of destroying the world for reasons that were later better classified as mistakes, or narrowly-averted disasters.
Given the fact that an agency full of humans is convinced that a given bunch of AGI-tators are within epsilon of dooming the world, what is the chance that they are right? And what is the chance that they have misconceived the situation such that by pulling the trigger, they will create an even worse situation?
My point isn’t some sort of hippy-dippy pacifism. My point is: Humans — all of us; you, me, the CIA — are running on corrupted hardware. At some point when we make a severe decision, one that goes against some well-learned rules such as not-killing, we have to take into account that almost everyone who’s ever been in that situation has been making a bad decision.
¹ Stanislav Petrov; 26 September 1983
² Jack Kennedy; Cuban Missile Crisis, October 1962
Fairly high. This is a far simpler situation than dealing with foreign powers. Raiding the research centre to investigate is a straightforward task. While they are in no place to evaluate friendliness themselves they are certainly capable of working out whether there is AI code that is about to be run—either by looking around or interrogating. Bear in mind that if it comes down to “do we need to shoot them?” the researchers must be resisting them and trying to run the doomsday code despite the intervention. That is a big deal.
Negligible.
The problem here is if other researchers or well meaning nutcases take it upon themselves to do some casual killing. An intelligence agency looking after the national interests—the same way it always does—is not a problem.
This is not some magical special case where there is some deep ethical reason that threat cannot be assessed. It is just another day at the office for the spooks and there is less cause for bias than usual—all the foreign politics gets out of the way.