I think you should probably note where people (who are still sold on AI risk) often disagree.
If I had a list of 5-10 resources that folks like Paul, Holden, Ajeya, Carl, etc. see as the main causes for optimism, I’d be happy to link those resources (either in a footnote or in the main body).
I’d definitely include something like ’survey data on the same population as my 2021 AI risk survey, saying how much people agree/disagree with the ten factors”, though I’d guess this isn’t the optimal use of those people’s time even if we want to use that time to survey something?
The tech path to AGI superintelligence is naturally slow enough and gradual enough, that world-destroyingly-critical alignment problems never appear faster than previous discoveries generalize to allow safe further experimentation.
When I split up probability mass a month ago between the market’s 16 options, this one only got 1.5% of my probability mass (12th place out of the 16). This obviously isn’t the same question we’re discussing here, but it maybe gives some perspective on why I didn’t single out this disagreement above the many other disagreements I could devote space to that strike me as way more relevant to hope? (For some combination of ‘likelier to happen’ and ‘likelier to make a big difference for p(doom) if they do happen’.)
The rate of progress seems very fast and it seems plausible that AI systems will race through the full range human reasoning ability over the course of a few years. But, this is hardly ‘likely to blow human intelligence out of the water immediately, or very soon after its invention’.
… Wait, why not? If AI exceeds the human capability range on STEM four years from now, I would call that ‘very soon’, especially given how terrible GPT-4 is at STEM right now.
The thesis here is not ‘we definitely won’t have twelve months to work with STEM-level AGI systems before they’re powerful enough to be dangerous’; it’s more like ‘we won’t have decades’. Somewhere between ‘no time’ and ‘a few years’ seems extremely likely to me, and I think that’s almost definitely not enough time to figure out alignment for those systems.
(Admittedly, in the minority of worlds where STEM-level AGI systems are totally safe for the first two years they’re operational, part of why it’s hard to make fast progress on alignment is that we won’t know they’re perfectly safe. An important chunk of the danger comes from the fact that humans have no clue where the line is between the most powerful systems that are safe, and the least powerful systems that are dangerous.)
Like, it’s not clear to me that even Paul thinks we’ll have much time with STEM-level AGI systems (in the OP’s sense) before we have vastly superhuman AI. Unless I’m misunderstanding, Paul’s optimism seems to have more to do with ‘vastly superhuman AI is currently ~30 years away’ and ‘capabilities will improve continuously over those 30 years, so we’ll have lots of time to learn more, see pretty scary failure modes, adjust our civilizational response, etc. before AI is competitive with the best human scientists’.
But capabilities gains still accelerate on Paul’s model, such that as time passes we get less and less time to work with impressive new capabilities before they’re blown out of the water by further advances (though Paul thinks other processes will offset this to produce good outcomes anyway); and these capabilities gains still end up stratospherically high before they plateau, such that we aren’t naturally going to get a lull to safely work with smarter-than-human systems for a while before they’re smart enough that a sufficiently incautious developer can destroy the world with them.
Maybe I’m misunderstanding something about Paul’s view, or maybe you’re pointing at other non-Paul-ish views...?
I think my views on takeoff/timelines are broadly similar to Paul’s except that I have somewhat shorter takeoffs and timelines (I think this is due to thinking AI is a bit easier and also due to misc deference).
… Wait, why not? If AI exceeds the human capability range on STEM four years from now, I would call that ‘very soon’, especially given how terrible GPT-4 is at STEM right now.
The thesis here is not ‘we definitely won’t have twelve months to work with STEM-level AGI systems before they’re powerful enough to be dangerous’; it’s more like ‘we won’t have decades’. Somewhere between ‘no time’ and ‘a few years’ seems extremely likely to me, and I think that’s almost definitely not enough time to figure out alignment for those systems.
Fair enough on ‘this is very soon’, but I think the exact quantitative details make a big difference between “AGI ruin seems nearly certain in the absense of positive miracless” and “doom seems quite plausible, but we’ll most likely make it through” (my probability of takeover is something like 35%)
I agree with ‘we won’t have decades’ (in the absense of large efforts to slow down which seem unlikely). But from the perspective of targeting our work and alignment research, there is a huge difference between steady and quite noticable takeoff over the course of a few years (which is still insanely fast to humans to be clear) and sudden takeoff within a month. For instance, this disagreement seems to drive a high fraction of the overall disagreement between OpenPhil/Paul/etc views and MIRI-ish views.
I don’t think this difference should be nearly enough to think the situation is close to ok! Under my views, the goverment should probably take immediate and drastic action if they could do so competently! That said, the picture for alignment researchers is quite different under these views and it seems important to try and get the exact details right when trying to explain the story for AI risk (I think we actually disagree here on details).
Additionally, I’d note that I do have some probability on ‘Yudkowsky style takeoff’ (but maybe only like 5%). Even if we were fine in all other worlds, this alone should be easily sufficient to justify a huge response from society!
Like, it’s not clear to me that even Paul thinks we’ll have much time with STEM-level AGI systems (in the OP’s sense) before we have vastly superhuman AI. Unless I’m misunderstanding, Paul’s optimism seems to have more to do with ‘vastly superhuman AI is currently ~30 years away’ and ‘capabilities will improve continuously over those 30 years, so we’ll have lots of time to learn more, see pretty scary failure modes, adjust our civilizational response, etc. before AI is competitive with the best human scientists’.
But capabilities gains still accelerate on Paul’s model, such that as time passes we get less and less time to work with impressive new capabilities before they’re blown out of the water by further advances (though Paul thinks other processes will offset this to produce good outcomes anyway); and these capabilities gains still end up stratospherically high before they plateau, such that we aren’t naturally going to get a lull to safely work with smarter-than-human systems for a while before they’re smart enough that a sufficiently incautious developer can destroy the world with them.
[not necessarily endorsed by Paul]
My understanding is that Paul has a 20 year median on ‘dyson sphere or similarly large technical accomplishment’. He also thinks the probability on ‘dyson sphere or similarly large technical accomplishment’ by end of the decade (within 7 years) is around 15%. Both of these scenerios involve a singularity of course (of which the final plateau is far beyond safe regions as you noted) and humans don’t have much a huge amount of time to respond.
I think the exact quantitative details make a big difference between “AGI ruin seems nearly certain in the absense of positive miracless” and “doom seems quite plausible, but we’ll most likely make it through” (my probability of takeover is something like 35%)
I don’t think that ‘the very first STEM-level AGI is smart enough to destroy the world if you relax some precautions’ and ‘we have 2.5 years to work with STEM-level AGI before any system is smart enough to destroy the world’ changes my p(doom) much at all. (Though this is partly because I don’t expect, in either of those worlds, that we’ll be able to be confident about which world we’re in.)
If we have 6 years to safely work with STEM-level AGI, that does intuitively start to feel like a significant net increase in p(hope) to me? Though this is complicated by the fact that such AGI probably couldn’t do pivotal acts either, and having STEM-level AGI for a longer period of time before a pivotal act occurs means that the tech will be more widespread when it does reach dangerous capability levels. So in the endgame, you’re likely to have a lot more competition, and correspondingly less time to spend on safety if you want to deploy before someone destroys the world.
If I had a list of 5-10 resources that folks like Paul, Holden, Ajeya, Carl, etc. see as the main causes for optimism, I’d be happy to link those resources (either in a footnote or in the main body).
I’d definitely include something like ’survey data on the same population as my 2021 AI risk survey, saying how much people agree/disagree with the ten factors”, though I’d guess this isn’t the optimal use of those people’s time even if we want to use that time to survey something?
One of the options in Eliezer’s Manifold market on AGI hope is:
When I split up probability mass a month ago between the market’s 16 options, this one only got 1.5% of my probability mass (12th place out of the 16). This obviously isn’t the same question we’re discussing here, but it maybe gives some perspective on why I didn’t single out this disagreement above the many other disagreements I could devote space to that strike me as way more relevant to hope? (For some combination of ‘likelier to happen’ and ‘likelier to make a big difference for p(doom) if they do happen’.)
… Wait, why not? If AI exceeds the human capability range on STEM four years from now, I would call that ‘very soon’, especially given how terrible GPT-4 is at STEM right now.
The thesis here is not ‘we definitely won’t have twelve months to work with STEM-level AGI systems before they’re powerful enough to be dangerous’; it’s more like ‘we won’t have decades’. Somewhere between ‘no time’ and ‘a few years’ seems extremely likely to me, and I think that’s almost definitely not enough time to figure out alignment for those systems.
(Admittedly, in the minority of worlds where STEM-level AGI systems are totally safe for the first two years they’re operational, part of why it’s hard to make fast progress on alignment is that we won’t know they’re perfectly safe. An important chunk of the danger comes from the fact that humans have no clue where the line is between the most powerful systems that are safe, and the least powerful systems that are dangerous.)
Like, it’s not clear to me that even Paul thinks we’ll have much time with STEM-level AGI systems (in the OP’s sense) before we have vastly superhuman AI. Unless I’m misunderstanding, Paul’s optimism seems to have more to do with ‘vastly superhuman AI is currently ~30 years away’ and ‘capabilities will improve continuously over those 30 years, so we’ll have lots of time to learn more, see pretty scary failure modes, adjust our civilizational response, etc. before AI is competitive with the best human scientists’.
But capabilities gains still accelerate on Paul’s model, such that as time passes we get less and less time to work with impressive new capabilities before they’re blown out of the water by further advances (though Paul thinks other processes will offset this to produce good outcomes anyway); and these capabilities gains still end up stratospherically high before they plateau, such that we aren’t naturally going to get a lull to safely work with smarter-than-human systems for a while before they’re smart enough that a sufficiently incautious developer can destroy the world with them.
Maybe I’m misunderstanding something about Paul’s view, or maybe you’re pointing at other non-Paul-ish views...?
I think my views on takeoff/timelines are broadly similar to Paul’s except that I have somewhat shorter takeoffs and timelines (I think this is due to thinking AI is a bit easier and also due to misc deference).
Fair enough on ‘this is very soon’, but I think the exact quantitative details make a big difference between “AGI ruin seems nearly certain in the absense of positive miracless” and “doom seems quite plausible, but we’ll most likely make it through” (my probability of takeover is something like 35%)
I agree with ‘we won’t have decades’ (in the absense of large efforts to slow down which seem unlikely). But from the perspective of targeting our work and alignment research, there is a huge difference between steady and quite noticable takeoff over the course of a few years (which is still insanely fast to humans to be clear) and sudden takeoff within a month. For instance, this disagreement seems to drive a high fraction of the overall disagreement between OpenPhil/Paul/etc views and MIRI-ish views.
I don’t think this difference should be nearly enough to think the situation is close to ok! Under my views, the goverment should probably take immediate and drastic action if they could do so competently! That said, the picture for alignment researchers is quite different under these views and it seems important to try and get the exact details right when trying to explain the story for AI risk (I think we actually disagree here on details).
Additionally, I’d note that I do have some probability on ‘Yudkowsky style takeoff’ (but maybe only like 5%). Even if we were fine in all other worlds, this alone should be easily sufficient to justify a huge response from society!
[not necessarily endorsed by Paul]
My understanding is that Paul has a 20 year median on ‘dyson sphere or similarly large technical accomplishment’. He also thinks the probability on ‘dyson sphere or similarly large technical accomplishment’ by end of the decade (within 7 years) is around 15%. Both of these scenerios involve a singularity of course (of which the final plateau is far beyond safe regions as you noted) and humans don’t have much a huge amount of time to respond.
For more, I guess I would just see Paul’s post “Where I agree and disagree with Eliezer”
Thanks for the replies, Ryan!
I don’t think that ‘the very first STEM-level AGI is smart enough to destroy the world if you relax some precautions’ and ‘we have 2.5 years to work with STEM-level AGI before any system is smart enough to destroy the world’ changes my p(doom) much at all. (Though this is partly because I don’t expect, in either of those worlds, that we’ll be able to be confident about which world we’re in.)
If we have 6 years to safely work with STEM-level AGI, that does intuitively start to feel like a significant net increase in p(hope) to me? Though this is complicated by the fact that such AGI probably couldn’t do pivotal acts either, and having STEM-level AGI for a longer period of time before a pivotal act occurs means that the tech will be more widespread when it does reach dangerous capability levels. So in the endgame, you’re likely to have a lot more competition, and correspondingly less time to spend on safety if you want to deploy before someone destroys the world.