Well, let’s start with the conditional probability if humans don’t find some other way to kill ourselves or end civilization before it comes to this. Eliezer seems to argue the following:
A. Given we survive long enough, we’ll find a way to write a self-modifying program that has, or can develop, human-level intelligence. (The capacity for self-modification follows from ‘artificial human intelligence,’ but since we’ve just seen links to writers ignoring that fact I thought I’d state it explicitly.) This necessarily gives the AI the potential for greater-than-human intelligence due to our known flaws. (I don’t know how we’d give it all of our disadvantages even if we wanted to. If we did, then someone else could and eventually would build an AI without such limits.)
B. Given A, the intelligence would improve itself to the point where we could no longer predict its actions in any detail.
C. Given B, the AI could escape from any box we put it in. (IIRC this excludes certain forms of encryption, but I see no remotely credible scenario in which we sufficiently encrypt every self-modifying AI forever.)
D. Given B and C, the AI could wipe out humanity if it ‘wanted’ to do so.
My estimate for the probability of some of these fluctuates from day to day, but I tend to give them all a high number. Claim A in particular seems almost undeniable given the evidence of our own existence. (I only listed that one separately so that people who want to argue can do so more precisely.) And when it comes to Claim E saying that if you tell a computer to kill you it will try to kill you, I don’t think the alternative has enough evidence to even consider. So I find it hard to imagine anyone rationally getting a total lower than 12% or just under 1⁄8.
Now that all applies to the conditional probability (if human technological civilization lives that long). I don’t know how to evaluate the timescale involved or the chance of us killing ourselves before the issue would come up. The latter certainly feels like less than 11⁄12.
The question would grow in importance if we found out that we needed to convince a nationally important number of people to pay attention to the issue before someone creates a theory of Friendly AI including AI goal stability. I really hope that doesn’t apply, because I suspect that if it does we’re screwed.
Well, let’s start with the conditional probability if humans don’t find some other way to kill ourselves or end civilization before it comes to this. Eliezer seems to argue the following:
A. Given we survive long enough, we’ll find a way to write a self-modifying program that has, or can develop, human-level intelligence. (The capacity for self-modification follows from ‘artificial human intelligence,’ but since we’ve just seen links to writers ignoring that fact I thought I’d state it explicitly.) This necessarily gives the AI the potential for greater-than-human intelligence due to our known flaws. (I don’t know how we’d give it all of our disadvantages even if we wanted to. If we did, then someone else could and eventually would build an AI without such limits.)
B. Given A, the intelligence would improve itself to the point where we could no longer predict its actions in any detail.
C. Given B, the AI could escape from any box we put it in. (IIRC this excludes certain forms of encryption, but I see no remotely credible scenario in which we sufficiently encrypt every self-modifying AI forever.)
D. Given B and C, the AI could wipe out humanity if it ‘wanted’ to do so.
My estimate for the probability of some of these fluctuates from day to day, but I tend to give them all a high number. Claim A in particular seems almost undeniable given the evidence of our own existence. (I only listed that one separately so that people who want to argue can do so more precisely.) And when it comes to Claim E saying that if you tell a computer to kill you it will try to kill you, I don’t think the alternative has enough evidence to even consider. So I find it hard to imagine anyone rationally getting a total lower than 12% or just under 1⁄8.
Now that all applies to the conditional probability (if human technological civilization lives that long). I don’t know how to evaluate the timescale involved or the chance of us killing ourselves before the issue would come up. The latter certainly feels like less than 11⁄12.
The question would grow in importance if we found out that we needed to convince a nationally important number of people to pay attention to the issue before someone creates a theory of Friendly AI including AI goal stability. I really hope that doesn’t apply, because I suspect that if it does we’re screwed.