Rather, I think he means that alignment is such a narrow target, and the space of all possible minds is so vast, that the default outcome is that unaligned AGI becomes unaligned ASI and ends up killing all humans (or even all life) in pursuit of its unaligned objectives. Hitting anywhere close to the alignment target (such that there’s at least 50% chance of “only” one billion people dying) would be a big win by comparison.
Of course, the actual goal is for “things [to] go great in the long run”, not just for us to avoid extinction. Alignment itself is the target, but safety is at least a consolation prize.
So no, I don’t think Nate, Eliezer, or anyone else is okay with releasing an AI that would kill hundreds of millions of people. But AGI is coming, whether we want it or not, and it will not be aligned with human survival (much less human flourishing) by default.
Eliezer tends to think that solving alignment is so much more difficult and so much less researched than raw AGI that doom is almost certain. I’m a bit more optimistic, but I agree that minimizing the probable magnitude of the doom is better than everyone dying.
Or are you saying that if one can get to that point, it’s much easier from there to get to the point of having an AI that will cause very few fatalities and is actually fit for practical use?
Feels like Y2K: Electric Boogaloo to me. In any case, if a major catastrophe did come of the first attempt to release an AGI, I think the global response would be to shut it all down, taboo the entire subject, and never let it be raised as a possibility again.
The tricky thing with human politics is that governments will still fund research into very dangerous technology if it has the potential to grant them a decisive advantage on the world stage.
No one wants nuclear war, but everyone wants nukes, even (or especially) after their destructive potential has been demonstrated. No one wants AGI to destroy the world, but everyone will want an AGI that can outthink their enemies, even (or especially) after its power has been demonstrated.
The goal, of course, is to figure out alignment before the first metaphorical (or literal) bomb goes off.
On that note, the main way I could envision AI being really destructive is getting access to a government’s nuclear arsenal. Otherwise, it’s extremely resourceful but still trapped in an electronic medium; the most it could do if it really wanted to cause damage is destroy the power grid (which would destroy it too).
Rather, I think he means that alignment is such a narrow target, and the space of all possible minds is so vast, that the default outcome is that unaligned AGI becomes unaligned ASI and ends up killing all humans (or even all life) in pursuit of its unaligned objectives. Hitting anywhere close to the alignment target (such that there’s at least 50% chance of “only” one billion people dying) would be a big win by comparison.
Of course, the actual goal is for “things [to] go great in the long run”, not just for us to avoid extinction. Alignment itself is the target, but safety is at least a consolation prize.
So no, I don’t think Nate, Eliezer, or anyone else is okay with releasing an AI that would kill hundreds of millions of people. But AGI is coming, whether we want it or not, and it will not be aligned with human survival (much less human flourishing) by default.
Eliezer tends to think that solving alignment is so much more difficult and so much less researched than raw AGI that doom is almost certain. I’m a bit more optimistic, but I agree that minimizing the probable magnitude of the doom is better than everyone dying.
Also this.
Feels like Y2K: Electric Boogaloo to me. In any case, if a major catastrophe did come of the first attempt to release an AGI, I think the global response would be to shut it all down, taboo the entire subject, and never let it be raised as a possibility again.
The tricky thing with human politics is that governments will still fund research into very dangerous technology if it has the potential to grant them a decisive advantage on the world stage.
No one wants nuclear war, but everyone wants nukes, even (or especially) after their destructive potential has been demonstrated. No one wants AGI to destroy the world, but everyone will want an AGI that can outthink their enemies, even (or especially) after its power has been demonstrated.
The goal, of course, is to figure out alignment before the first metaphorical (or literal) bomb goes off.
On that note, the main way I could envision AI being really destructive is getting access to a government’s nuclear arsenal. Otherwise, it’s extremely resourceful but still trapped in an electronic medium; the most it could do if it really wanted to cause damage is destroy the power grid (which would destroy it too).
you’re underestimating biology