The bottom line is: nobody has a strong argument in support of the inevitability of the doom scenario (If you have it, just reply to this with a clear and self contained argument.).
From what I’m reading in the comments and in other papers/articles, it’s a mixture of beliefs, estrapolations from known facts, reliance on what “experts” said, cherry picking. Add the fact that bad/pessimistic news travel and spread faster than boring good news.
A sober analysis enstablish that super-AGI can be dangerous (indeed there are no theorems forbidding this either), what’s unproven is that it will be HIGHLY LIKELY to be a net minus for humanity. Even admitting that alignement is not possible, it’s not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a “good” super-AGI).
Another factors often forgotten is that what we mean by “humanity” today may not have the same meaning when we will have technologies like AGIs, mind upload or intelligence enhancement. We may literally become those AIs.
Even admitting that alignement is not possible, it’s not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a “good” super-AGI).
Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.
That seems like extremely limited, human thinking. If we’re assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans. The issue, then, is not fully aligning AGI goals with human goals, but ensuring it has “don’t wipe out humanity, don’t cause extreme negative impacts to humanity” somewhere in its utility function. Probably doesn’t even need to be weighted too strongly, if we’re talking about a truly powerful AGI. Chimpanzees presumably don’t want humans to rule the world—yet they have made no coherent effort to stop us from doing so, probably haven’t even realized we are doing so, and even if they did we could pretty easily ignore it.
“If something could get in the way (or even wants to get in the way, whether or not it is capable of trying) I need to wipe it out” is a sad, small mindset and I am entirely unconvinced that a significant portion of hypothetically likely AGIs would think this way. I think AGI will radically change the world, and maybe not for the better, but extinction seems like a hugely unlikely outcome.
Why would it “want” to keep humans around? How much do you care about whether or not you move dirt while you drive to work? If you don’t care about something at all, it won’t factor in to your choice of actions[1]
I know I phrased this tautologically, but I think the idiom will be clear. If not, just press me on it more. I think this is the best way to get the message across or I wouldn’t have done it.
some sort of general value for life, or a preference for decreased suffering of thinking beings, or the off chance we can do something to help (which i would argue is almost exactly the same low chance that we could do something to hurt it). I didn’t say there wasn’t an alignment problem, just that AGI whose goals don’t perfectly align with those of humanity in general isn’t necessarily catastrophic. Utility functions tend to have a lot of things they want to maximize, with different weights. Ensuring one or more of the above ideas is present in an AGI is important.
I gather the problem is that we cannot reliably incorporate that, or anything else, into a machine’s utility function: if it can change its source code (which would be the easiest way for it to bootstrap itself to superintelligence), it can also change its utility function in unpredictable ways. (Not necessarily on purpose, but the utility function can take collateral damage from other optimizations.)
I’m glad you started this thread: to someone like me who doesn’t follow AI safety closely, the argument starts to feel like, “Assume the machine is out to get us, and has an unstoppable ‘I Win’ button...” It’s worth knowing why some people think those are reasonable assumptions, and why (or if) others disagree with them. It would be great if there was an “AI Doom FAQ” to cover the basics and get newbies and dilettantes up to speed.
An excellent primer—thank you! I hope Scott revisits it someday, since it sounds like recent developments have narrowed the range of probable outcomes.
That seems like extremely limited, human thinking. If we’re assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans.
If humans are capable of building one AGI, they certainly would be capable to build a second one which could have goals unaligned with the first one.
I assume that any unrestrained AGI would pretty much immediately exert enough control over the mechanisms through which an AGI might take power (say, the internet, nanotech, whatever else it thinks of) to ensure that no other AI could do so without its permission. I suppose it is plausible that humanity is capable of threatening an AGI through the creation of another, but that seems rather unlikely in practice. First-mover advantage is incalculable to an AGI.
The bottom line is: nobody has a strong argument in support of the inevitability of the doom scenario (If you have it, just reply to this with a clear and self contained argument.).
From what I’m reading in the comments and in other papers/articles, it’s a mixture of beliefs, estrapolations from known facts, reliance on what “experts” said, cherry picking. Add the fact that bad/pessimistic news travel and spread faster than boring good news.
A sober analysis enstablish that super-AGI can be dangerous (indeed there are no theorems forbidding this either), what’s unproven is that it will be HIGHLY LIKELY to be a net minus for humanity. Even admitting that alignement is not possible, it’s not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a “good” super-AGI).
Another factors often forgotten is that what we mean by “humanity” today may not have the same meaning when we will have technologies like AGIs, mind upload or intelligence enhancement. We may literally become those AIs.
Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.
That seems like extremely limited, human thinking. If we’re assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans. The issue, then, is not fully aligning AGI goals with human goals, but ensuring it has “don’t wipe out humanity, don’t cause extreme negative impacts to humanity” somewhere in its utility function. Probably doesn’t even need to be weighted too strongly, if we’re talking about a truly powerful AGI. Chimpanzees presumably don’t want humans to rule the world—yet they have made no coherent effort to stop us from doing so, probably haven’t even realized we are doing so, and even if they did we could pretty easily ignore it.
“If something could get in the way (or even wants to get in the way, whether or not it is capable of trying) I need to wipe it out” is a sad, small mindset and I am entirely unconvinced that a significant portion of hypothetically likely AGIs would think this way. I think AGI will radically change the world, and maybe not for the better, but extinction seems like a hugely unlikely outcome.
Why would it “want” to keep humans around? How much do you care about whether or not you move dirt while you drive to work? If you don’t care about something at all, it won’t factor in to your choice of actions[1]
I know I phrased this tautologically, but I think the idiom will be clear. If not, just press me on it more. I think this is the best way to get the message across or I wouldn’t have done it.
some sort of general value for life, or a preference for decreased suffering of thinking beings, or the off chance we can do something to help (which i would argue is almost exactly the same low chance that we could do something to hurt it). I didn’t say there wasn’t an alignment problem, just that AGI whose goals don’t perfectly align with those of humanity in general isn’t necessarily catastrophic. Utility functions tend to have a lot of things they want to maximize, with different weights. Ensuring one or more of the above ideas is present in an AGI is important.
I think that if we can reliably incorporate that into a machine’s utility function, we’d be most of the way to alignment, right?
I gather the problem is that we cannot reliably incorporate that, or anything else, into a machine’s utility function: if it can change its source code (which would be the easiest way for it to bootstrap itself to superintelligence), it can also change its utility function in unpredictable ways. (Not necessarily on purpose, but the utility function can take collateral damage from other optimizations.)
I’m glad you started this thread: to someone like me who doesn’t follow AI safety closely, the argument starts to feel like, “Assume the machine is out to get us, and has an unstoppable ‘I Win’ button...” It’s worth knowing why some people think those are reasonable assumptions, and why (or if) others disagree with them. It would be great if there was an “AI Doom FAQ” to cover the basics and get newbies and dilettantes up to speed.
I’d recomend https://www.lesswrong.com/posts/LTtNXM9shNM9AC2mp/superintelligence-faq as a good starting point for newcomers.
An excellent primer—thank you! I hope Scott revisits it someday, since it sounds like recent developments have narrowed the range of probable outcomes.
If humans are capable of building one AGI, they certainly would be capable to build a second one which could have goals unaligned with the first one.
I assume that any unrestrained AGI would pretty much immediately exert enough control over the mechanisms through which an AGI might take power (say, the internet, nanotech, whatever else it thinks of) to ensure that no other AI could do so without its permission. I suppose it is plausible that humanity is capable of threatening an AGI through the creation of another, but that seems rather unlikely in practice. First-mover advantage is incalculable to an AGI.