From my comments on the MLSS project submission (which aren’t intended to be comprehensive):
Quite enjoyed reading this, thanks for writing!
My guess is that the factors combine to create a roughly linear model. Even if progress is unpredictable and not linear, the average rate of progress will still be linear.
I’m very skeptical that this is a linear interpolation. It’s the core of your argument, but I didn’t think it was really argued. I would be very surprised if moving from 50% to 49% risk took similar time as moving from 2% to 1% risk, even if there are more researchers, unless the research pool grows exponentially. I don’t really think you’ve justified this linear trend.
The report also seems to just assume aligned AI would reduce other x-risk to zero. I’m not sure why this should be assumed. I can see a case for a large reduction in it, but it’s not necessarily obvious.
Lastly, it felt strange to me to not explore risks of cognitively enhanced humans: for instance, risks that cognitively enhanced humans tould have different values, or risks that cognitively enhanced humans would subjugate unenhanced humans.
I started with the assumption that alignment progress would have diminishing returns. Then the two other factors I took into account were the increasing relevance of alignment research over time[1] and an increasing number of alignment researchers. My model was that the diminishing returns would be canceled out by the increasing number of researchers and increasing relevance.
It seems like you’re emphasizing the importance of diminishing returns. If diminishing returns are more important than the other two factors, progress would slow down over time. I’m not sure which factors are most influential though I may have been underestimating the importance of diminishing returns.
Quote on how AI could reduce AI risk:
“An aligned ASI could reduce or eliminate natural state risks such as the risk from asteroid strikes, supervolcanoes, or stellar explosions by devising protective technologies or by colonizing space so that civilization would continue if Earth were destroyed.”
I think you’re referring to this quote:
“Total existential risk would probably then stop increasing because the ASI could prevent all further existential risks or because an existential catastrophe would have occurred.”
I think I could have explained this point more. I think existential risk levels would fall to very low levels after an aligned ASI is created by definition:
If the AI were aligned, then the AI itself would be a low source of existential risk.
If it’s also superintelligent, it should be powerful enough to strongly reduce all existential risks.
Those are some good points on cognitively enhanced humans. I don’t think I emphasized the downsides enough. Maybe I need to expand that section.
If the aligned superintelligent AGI is known about by all powerful parties (mostly governments), and some of those governments have or believe they have interests non-aligned with the AGI, then there is a large incentive for those governments to go to war against the AGI. If the AGI is only moderately superhuman and we don’t see intelligence-explosion type effects (possibly because we have a prosaic AGI), this would be a very risky situation to be in.
I agree. The world could be at a higher risk of conflict just before or after the first ASI is created. Though even if there is a fast takeoff, the risk is still there before the takeoff if it is obvious that an ASI is about to be created.
This scenario is described in quite a lot of detail in chapter 5 of Superintelligence:
“Given the extreme security implications of superintelligence, governments would likely seek to nationalize any project on their territory that they thought close to achieving a takeoff. A powerful state might also attempt to acquire projects located in other countries through espionage, theft, kidnapping, bribery, threats, military conquest, or any other available means.”
Added explanation for why an aligned ASI would significantly reduce all existential risks:
″ the ASI could prevent all further existential risks. The reason why follows from its definition: an aligned ASI would itself not be a source of existential risk and since it’s superintelligent, it would be powerful enough to eliminate all further risks.”
Updated graph to show exponentially decreasing model in addition to the linear model.
From my comments on the MLSS project submission (which aren’t intended to be comprehensive):
Quite enjoyed reading this, thanks for writing!
I’m very skeptical that this is a linear interpolation. It’s the core of your argument, but I didn’t think it was really argued. I would be very surprised if moving from 50% to 49% risk took similar time as moving from 2% to 1% risk, even if there are more researchers, unless the research pool grows exponentially. I don’t really think you’ve justified this linear trend.
The report also seems to just assume aligned AI would reduce other x-risk to zero. I’m not sure why this should be assumed. I can see a case for a large reduction in it, but it’s not necessarily obvious.
Lastly, it felt strange to me to not explore risks of cognitively enhanced humans: for instance, risks that cognitively enhanced humans tould have different values, or risks that cognitively enhanced humans would subjugate unenhanced humans.
I started with the assumption that alignment progress would have diminishing returns. Then the two other factors I took into account were the increasing relevance of alignment research over time[1] and an increasing number of alignment researchers. My model was that the diminishing returns would be canceled out by the increasing number of researchers and increasing relevance.
It seems like you’re emphasizing the importance of diminishing returns. If diminishing returns are more important than the other two factors, progress would slow down over time. I’m not sure which factors are most influential though I may have been underestimating the importance of diminishing returns.
Quote on how AI could reduce AI risk:
I think you’re referring to this quote:
I think I could have explained this point more. I think existential risk levels would fall to very low levels after an aligned ASI is created by definition:
If the AI were aligned, then the AI itself would be a low source of existential risk.
If it’s also superintelligent, it should be powerful enough to strongly reduce all existential risks.
Those are some good points on cognitively enhanced humans. I don’t think I emphasized the downsides enough. Maybe I need to expand that section.
Toby Ord calls this decreasing nearsightedness.
If the aligned superintelligent AGI is known about by all powerful parties (mostly governments), and some of those governments have or believe they have interests non-aligned with the AGI, then there is a large incentive for those governments to go to war against the AGI. If the AGI is only moderately superhuman and we don’t see intelligence-explosion type effects (possibly because we have a prosaic AGI), this would be a very risky situation to be in.
I agree. The world could be at a higher risk of conflict just before or after the first ASI is created. Though even if there is a fast takeoff, the risk is still there before the takeoff if it is obvious that an ASI is about to be created.
This scenario is described in quite a lot of detail in chapter 5 of Superintelligence:
Changes:
Added explanation for why an aligned ASI would significantly reduce all existential risks:
Updated graph to show exponentially decreasing model in addition to the linear model.