I don’t believe that you can pass an Ideological Turing Test for people who’ve thought seriously about these issues and assign a decent probability to things going well in the long term, e.g. Paul Christiano, Carl Shulman, Holden Karnofsky and a few others.
The futures described by the likes of Carl Shulman, which I find relatively plausible, don’t fit neatly into your categories but seem to be some combination of (3) (though you lump ‘we can use pseudo aligned AI that does what we want it to do in the short term on well specified problems to navigate competitive pressures’ in with ‘AI systems will want to do what’s morally best by default so we don’t need alignment’) and (10) (which I would phrase as ‘unaligned AIs which aren’t deceptively inner misaligned will place some value on human happiness and placating their overseers’), along with (4) and (6), (specifically that either an actual singleton or a coalition with strong verifiable agreements and regulation can emerge, or that coordination can be made easier by AI advice).
Both 3 and 10 I think are plausible when phrased correctly: that pseudo-aligned powerful AI can help governments and corporations navigate competitive pressures, and that AI systems (assuming they don’t have some strongly deceptively misaligned goal from a sharp left turn) will still want to do things that satisfy their overseers among the other things they might want, such that there won’t be strong competitive pressures to kill everyone and ignore everything their human overseers want.
In general, I’m extremely wary of arguing by ‘default’ in either direction. Both of these are weak arguments for similar reasons:
The default outcome of “humans all have very capable AIs that do what they want, and the humans are otherwise free”, is that humans gain cheap access to near-infinite cognitive labor for whatever they want to do, and shortly after a huge increase in resources and power over the world. This results in a super abundant utopian future.
The default outcome of “humans all have very capable AIs that do what they want, and the humans are otherwise free”, is that ‘the humans all turn everything over to their AIs and set them loose to compete’ because anyone not doing that loses out, on an individual level and on a corporate or national level, resulting in human extinction or dystopia via ‘race to the bottom’
My disagreement is that I don’t think there’s a strong default either way. At best there is a (highly unclear) default towards futures that involve a race to the bottom, but that’s it.
I’m going to be very nitpicky here but with good cause. “To compete”—for what, money, resources, social approval? “The humans”—the original developers of advanced AI, which might be corporations, corporations in close public-private partnership, or big closed government projects?
I need to know what this actually looks like in practice to assess the model, because the world is not this toy model, and the dis-analogies aren’t just irrelevant details.
What happens when we get into those details?
We can try to make this specific, envision a concrete scenario involving a fairly quick full automation of everything, and think about (in a scenario where governments and corporate decision-makers have pseudo-aligned AI which will e.g. answer questions super-humanly well in the short term) a specific bad scenario, and then also imagine ways it could not happen. I’ve done this before, and you can do it endlessly:
In the future, AI-driven management assistant software revolutionizes industries by automating decision-making processes, including “soft skills” like conflict resolution. This leads to massive job automation, even at high management levels. Companies that don’t adopt this technology fall behind. An interconnected “production web” of companies emerges, operating with minimal human intervention and focusing on maximizing production. They develop a self-sustaining economy, using digital currencies and operating beyond human regulatory reach. Over time, these companies, driven by their AI-optimized objectives, inadvertently prioritize their production goals over human welfare. This misalignment leads to the depletion of essential resources like arable land and drinking water, ultimately threatening human survival, as humanity becomes unable to influence or stop these autonomous corporate entities.
My object-level response is to say something mundane along the lines of, I think each of the following is more or less independent and not extremely unlikely to occur (each is above 1% likely):
Wouldn’t governments and regulators also have access to AI systems to aid with oversight and especially with predicting the future? Remember, in this world we have pseudo-aligned AI systems that will more or less do what their overseers want in the short term.
Couldn’t a political candidate ask their (aligned) strategist-AI ‘are we all going to be killed by this process in 20 years’ and then make a persuasive campaign to change the public’s mind with this early in the process, using obvious evidence to their advantage
If the world is alarmed by the expanding production web and governments have a lot of hard power initially, why will enforcement necessarily be ineffective? If there’s a shadow economy of digital payments, just arrest anyone found dealing with a rogue AI system. This would scare a lot of people.
What if the lead project is unitary and a singleton or the few lead projects quickly band together because they’re foresightful, so none of this race to the bottom stuff happens in the first place?
If it gets to the point where water or the oxygen in the atmosphere is being used up (why would that happen again, why wouldn’t it just be easier for the machines to fly off into space and not have to deal with the presumed disvalue of doing something their original overseers didn’t like?) did nobody build in ‘off switches’?
Even if they aren’t fulfilling our values perfectly, wouldn’t the production web just reach some equilibrium where it’s skimming off a small amount of resources to placate its overseers (since its various components are at least somewhat beholden to them) while expanding further and further?
And I already know the response is just going to be “Moloch wouldn’t let that happen..” and that eventually competition will mean that all of these barriers disappear. At this point though I think that such a response is too broad and proves too much. If you use the moloch idea this way it becomes the classic mistaken “one big idea universal theory of history” which can explain nearly any outcome so long as it doesn’t have to predict it.
A further point: I think that someone using this kind of reasoning in 1830 would have very confidently predicted that the world of 2023 would be a horrible dystopia where wages for workers wouldn’t have improved at all because of moloch.
You can call this another example of (11), i.e. assuming the default outcome will be fine and then this arguing against a specific bad scenario, so it doesn’t affect the default, but that assumes what you’re trying to establish.
I’m arguing that when you get into practical details of any scenario (assuming pseudo-aligned AI and no sharp left turn or sudden emergence), you can think of ways to utilize the vast new cognitive labor force available to humanity to preempt or deal with the potential competitive race to the bottom, the offense-defense imbalance, or other challenges, which messes up the neat toy model of competitive pressures wrecking everything.
When you try to translate the toy model of:
“everyone” gets an AI that’s pseudo aligned --> “everyone” gives up control and lets the AI “compete” --> “all human value is sacrificed” which presumably means we run out of resources on the earth or are killed
into a real world scenario by adding details on about who develops the AI systems, what they want, and specific ways the AI systems could be used, we also get practical, real world counterarguments as to why it might not happen. Things still get dicey the faster takeoff is and the harder alignment is, as we don’t have this potential assistance to rely on and have to do everything ourselves, but we already knew that, and you correctly point out that nobody knows the real truth about alignment difficulty anyway.
To be clear, I still think that some level of competitive degradation is a default, there will be a strong competitive pressure to delegate more and more decision making to AI systems and take humans out of the loop, but this proceeding unabated towards systems that are pressured into not caring about their overseers at all, resulting in a world of ‘perfect competition’ with a race to the bottom that proceeds unimpeded until humans are killed, is a much weaker default than you describe in practice.
Treating a molochian race to the bottom as an overwhelmingly strong default ignores the complexity of real-world systems, the potential for adaptation and intervention, and the historical track record of similar predictions about complicated social systems.
I don’t believe that you can pass an Ideological Turing Test for people who’ve thought seriously about these issues and assign a decent probability to things going well in the long term, e.g. Paul Christiano, Carl Shulman, Holden Karnofsky and a few others.
The futures described by the likes of Carl Shulman, which I find relatively plausible, don’t fit neatly into your categories but seem to be some combination of (3) (though you lump ‘we can use pseudo aligned AI that does what we want it to do in the short term on well specified problems to navigate competitive pressures’ in with ‘AI systems will want to do what’s morally best by default so we don’t need alignment’) and (10) (which I would phrase as ‘unaligned AIs which aren’t deceptively inner misaligned will place some value on human happiness and placating their overseers’), along with (4) and (6), (specifically that either an actual singleton or a coalition with strong verifiable agreements and regulation can emerge, or that coordination can be made easier by AI advice).
Both 3 and 10 I think are plausible when phrased correctly: that pseudo-aligned powerful AI can help governments and corporations navigate competitive pressures, and that AI systems (assuming they don’t have some strongly deceptively misaligned goal from a sharp left turn) will still want to do things that satisfy their overseers among the other things they might want, such that there won’t be strong competitive pressures to kill everyone and ignore everything their human overseers want.
In general, I’m extremely wary of arguing by ‘default’ in either direction. Both of these are weak arguments for similar reasons:
The default outcome of “humans all have very capable AIs that do what they want, and the humans are otherwise free”, is that humans gain cheap access to near-infinite cognitive labor for whatever they want to do, and shortly after a huge increase in resources and power over the world. This results in a super abundant utopian future.
The default outcome of “humans all have very capable AIs that do what they want, and the humans are otherwise free”, is that ‘the humans all turn everything over to their AIs and set them loose to compete’ because anyone not doing that loses out, on an individual level and on a corporate or national level, resulting in human extinction or dystopia via ‘race to the bottom’
My disagreement is that I don’t think there’s a strong default either way. At best there is a (highly unclear) default towards futures that involve a race to the bottom, but that’s it.
I’m going to be very nitpicky here but with good cause. “To compete”—for what, money, resources, social approval? “The humans”—the original developers of advanced AI, which might be corporations, corporations in close public-private partnership, or big closed government projects?
I need to know what this actually looks like in practice to assess the model, because the world is not this toy model, and the dis-analogies aren’t just irrelevant details.
What happens when we get into those details?
We can try to make this specific, envision a concrete scenario involving a fairly quick full automation of everything, and think about (in a scenario where governments and corporate decision-makers have pseudo-aligned AI which will e.g. answer questions super-humanly well in the short term) a specific bad scenario, and then also imagine ways it could not happen. I’ve done this before, and you can do it endlessly:
Original
You can call this another example of (11), i.e. assuming the default outcome will be fine and then this arguing against a specific bad scenario, so it doesn’t affect the default, but that assumes what you’re trying to establish.
I’m arguing that when you get into practical details of any scenario (assuming pseudo-aligned AI and no sharp left turn or sudden emergence), you can think of ways to utilize the vast new cognitive labor force available to humanity to preempt or deal with the potential competitive race to the bottom, the offense-defense imbalance, or other challenges, which messes up the neat toy model of competitive pressures wrecking everything.
When you try to translate the toy model of:
into a real world scenario by adding details on about who develops the AI systems, what they want, and specific ways the AI systems could be used, we also get practical, real world counterarguments as to why it might not happen. Things still get dicey the faster takeoff is and the harder alignment is, as we don’t have this potential assistance to rely on and have to do everything ourselves, but we already knew that, and you correctly point out that nobody knows the real truth about alignment difficulty anyway.
To be clear, I still think that some level of competitive degradation is a default, there will be a strong competitive pressure to delegate more and more decision making to AI systems and take humans out of the loop, but this proceeding unabated towards systems that are pressured into not caring about their overseers at all, resulting in a world of ‘perfect competition’ with a race to the bottom that proceeds unimpeded until humans are killed, is a much weaker default than you describe in practice.
Treating a molochian race to the bottom as an overwhelmingly strong default ignores the complexity of real-world systems, the potential for adaptation and intervention, and the historical track record of similar predictions about complicated social systems.