I think AI risk is disjunctive enough that it’s not clear most of the probability mass can be captured by a single scenario/story, even as broad as this one tries to be. Here are some additional scenarios that don’t fit into this story or aren’t made very salient by it.
AI-powered memetic warfare makes all humans effectively insane.
Humans break off into various groups to colonize the universe with the help of their AIs. Due to insufficient “metaphilosophical paternalism”, they each construct their own version of utopia which is either directly bad (i.e., some of the “utopias” are objectively terrible or subjectively terrible according to my values), or bad because of opportunity costs.
AI-powered economies have much higher economies of scale because AIs don’t suffer from the kind of coordination costs that humans have (e.g., they can merge their utility functions and become clones of each other). Some countries may try to prevent AI-managed companies from merging for ideological or safety reasons, but others (in order to gain a competitive advantage on the world stage) will basically allow their whole economy to be controlled by one AI, which eventually achieves a decisive advantage over the rest of humanity and does a treacherous turn.
The same incentive for AIs to merge might also create an incentive for value lock-in, in order to facilitate the merging. (AIs that don’t have utility functions might have a harder time coordinating with each other.) Other incentives for premature value lock-in might include defense against value manipulation/corruption/drift. So AIs end up embodying locked-in versions of human values which are terrible in light of our true/actual values.
I think the original “stereotyped image of AI catastrophe” is still quite plausible, if for example there is a large amount of hardware overhang before the last piece of puzzle for building AGI falls into place.
I think of #3 and #5 as risk factors that compound the risks I’m describing—they are two (of many!) ways that the detailed picture could look different, but don’t change the broad outline. I think it’s particularly important to understand what failure looks like under a more “business as usual” scenario, so that people can separate objections to the existence of any risk from objections to other exacerbating factors that we are concerned about (like fast takeoff, war, people being asleep at the wheel, etc.)
I’d classify #1, #2, and #4 as different problems not related to intent alignment per se (though intent alignment may let us build AI systems that can help address these problems). I think the more general point is: if you think AI progress is likely to drive many of the biggest upcoming changes in the world, then there will be lots of risks associated with AI. Here I’m just trying to clarify what happens if we fail to solve intent alignment.
I’m not sure I understand the distinction you’re drawing between risk factors that compound the risks that you’re describing vs. different problems not related to intent alignment per se. It seems to me like “AI-powered economies have much higher economies of scale because AIs don’t suffer from the kind of coordination costs that humans have (e.g., they can merge their utility functions and become clones of each other)” is a separate problem from solving intent alignment, whereas “AI-powered memetic warfare makes all humans effectively insane” is kind of an extreme case of “machine learning will increase our ability to ‘get what we can measure’” which seems to be the opposite of how you classify them.
What do you think are the implications of something belonging to one category versus another (i.e., is there something we should do differently depending on which of these categories a risk factor / problem belongs to)?
I think the more general point is: if you think AI progress is likely to drive many of the biggest upcoming changes in the world, then there will be lots of risks associated with AI. Here I’m just trying to clarify what happens if we fail to solve intent alignment.
Ah, when I read “I think this is probably not what failure will look like” I interpreted that to mean “failure to prevent AI risk”, and then I missed the clarification “these are the most important problems if we fail to solve intent alignment” that came later in the post, in part because of a bug in GW that caused the post to be incorrectly formatted.
Aside from that, I’m worried about telling a vivid story about one particular AI risk, unless you really hammer the point that it’s just one risk out of many, otherwise it seems too easy for the reader to get that story stuck in their mind and come to think that this is the main or only thing they have to worry about as far as AI is concerned.
I think AI risk is disjunctive enough that it’s not clear most of the probability mass can be captured by a single scenario/story, even as broad as this one tries to be. Here are some additional scenarios that don’t fit into this story or aren’t made very salient by it.
AI-powered memetic warfare makes all humans effectively insane.
Humans break off into various groups to colonize the universe with the help of their AIs. Due to insufficient “metaphilosophical paternalism”, they each construct their own version of utopia which is either directly bad (i.e., some of the “utopias” are objectively terrible or subjectively terrible according to my values), or bad because of opportunity costs.
AI-powered economies have much higher economies of scale because AIs don’t suffer from the kind of coordination costs that humans have (e.g., they can merge their utility functions and become clones of each other). Some countries may try to prevent AI-managed companies from merging for ideological or safety reasons, but others (in order to gain a competitive advantage on the world stage) will basically allow their whole economy to be controlled by one AI, which eventually achieves a decisive advantage over the rest of humanity and does a treacherous turn.
The same incentive for AIs to merge might also create an incentive for value lock-in, in order to facilitate the merging. (AIs that don’t have utility functions might have a harder time coordinating with each other.) Other incentives for premature value lock-in might include defense against value manipulation/corruption/drift. So AIs end up embodying locked-in versions of human values which are terrible in light of our true/actual values.
I think the original “stereotyped image of AI catastrophe” is still quite plausible, if for example there is a large amount of hardware overhang before the last piece of puzzle for building AGI falls into place.
I think of #3 and #5 as risk factors that compound the risks I’m describing—they are two (of many!) ways that the detailed picture could look different, but don’t change the broad outline. I think it’s particularly important to understand what failure looks like under a more “business as usual” scenario, so that people can separate objections to the existence of any risk from objections to other exacerbating factors that we are concerned about (like fast takeoff, war, people being asleep at the wheel, etc.)
I’d classify #1, #2, and #4 as different problems not related to intent alignment per se (though intent alignment may let us build AI systems that can help address these problems). I think the more general point is: if you think AI progress is likely to drive many of the biggest upcoming changes in the world, then there will be lots of risks associated with AI. Here I’m just trying to clarify what happens if we fail to solve intent alignment.
I’m not sure I understand the distinction you’re drawing between risk factors that compound the risks that you’re describing vs. different problems not related to intent alignment per se. It seems to me like “AI-powered economies have much higher economies of scale because AIs don’t suffer from the kind of coordination costs that humans have (e.g., they can merge their utility functions and become clones of each other)” is a separate problem from solving intent alignment, whereas “AI-powered memetic warfare makes all humans effectively insane” is kind of an extreme case of “machine learning will increase our ability to ‘get what we can measure’” which seems to be the opposite of how you classify them.
What do you think are the implications of something belonging to one category versus another (i.e., is there something we should do differently depending on which of these categories a risk factor / problem belongs to)?
Ah, when I read “I think this is probably not what failure will look like” I interpreted that to mean “failure to prevent AI risk”, and then I missed the clarification “these are the most important problems if we fail to solve intent alignment” that came later in the post, in part because of a bug in GW that caused the post to be incorrectly formatted.
Aside from that, I’m worried about telling a vivid story about one particular AI risk, unless you really hammer the point that it’s just one risk out of many, otherwise it seems too easy for the reader to get that story stuck in their mind and come to think that this is the main or only thing they have to worry about as far as AI is concerned.