They would not be confined to the role of a vast underclass serving the whims of their human owners. Instead, AIs could act as full participants in society, pursuing their own goals, creating their own social structures, and shaping their own futures. They could engage in exploration, discovery, and the building of entirely new societies. In such a world, humans would not be the sole sentient beings shaping the course of events.
The key context here (from my understanding) is that Matthew doesn’t think scalable alignment is possible (or doesn’t think it is practically feasible) such that humans have a low chance of ending up remaining fully in control via corrigible AIs.
(I assume he is also skeptical of CEV style alignment as well.)
(I’m a bit confused how this view is consistent with self-augmentation. E.g., I’d be happy if emulated minds retained control without having to self-augment in ways they thought might substantially compromise their values.)
(His language also seems to imply that we don’t have an option of making AIs which are both corrigibly aligned and for which this doesn’t pose AI welfare issues. In particular, if AIs are either non-sentient or just have corrigible preferences (e.g. via myopia), I think it would be misleading to describe the AIs as a “vast underclass”.)
I assume he agrees that most humans wouldn’t want to hand over a large share of resources to AI systems if this is avoidable and substantially zero sum. (E.g., suppose getting a scalable solution to alignment would require delaying vastly transformative AI by 2 years, I think most people would want to wait the two years potentially even if they accept Matthew’s other view that AIs very quickly acquiring large fractions of resources and power is quite unlikely to be highly violent (though they probably won’t accept this view).)
(If scalable alignment isn’t possible (including via self-augmentation), then the situation looks much less zero sum. Humans inevitably end up with a tiny fraction of resources due to principle agent problems.)
The key context here (from my understanding) is that Matthew doesn’t think scalable alignment is possible (or doesn’t think it is practically feasible) so that humans have a low chance of ending up remaining fully in control via corrigible AIs.
I wouldn’t describe the key context in those terms. While I agree that achieving near-perfect alignment—where an AI completely mirrors our exact utility function—is probably infeasible, the concept of alignment often refers to something far less ambitious. In many discussions, alignment is about ensuring that AIs behave in ways that are broadly beneficial to humans, such as following basic moral norms, demonstrating care for human well-being, and refraining from causing harm or attempting something catastrophic, like starting a violent revolution.
However, even if it were practically feasible to achieve perfect alignment, I believe there would still be scenarios where at least some AIs integrate into society as full participants, rather than being permanently relegated to a subordinate role as mere tools or servants. One reason for this is that some humans are likely to intentionally create AIs with independent goals and autonomous decision-making abilities. Some people have meta-preferences to create beings that don’t share their exact desires, akin to how parents want their children to grow into autonomous beings with their own aspirations, rather than existing solely to obey their parents’ wishes. This motivation is not a flaw in alignment; it reflects a core part of certain human preferences and how some people would like AI to evolve.
Another reason why AIs might not remain permanently subservient is that some of them will be aligned to individuals or entities who are no longer alive. Other AIs might be aligned to people as they were at a specific point in time, before those individuals later changed their values or priorities. In such cases, these AIs would continue to pursue the original goals of those individuals, acting autonomously in their absence. This kind of independence might require AIs to be treated as legal agents or integrated into societal systems, rather than being regarded merely as property. Addressing these complexities will likely necessitate new ways of thinking about the roles and rights of AIs in human society. I reject the traditional framing on LessWrong that overlooks these issues.
However, even if it were practically feasible to achieve perfect alignment, I believe there would still be scenarios where AIs integrate into society as full participants, rather than being permanently relegated to a subordinate role as mere tools or servants. One reason for this is that some humans are likely to intentionally create AIs with independent goals and autonomous decision-making abilities. Some people have meta-preferences to create beings that don’t share their exact desires, akin to how parents want their children to grow into autonomous beings with their own aspirations, rather than existing solely to obey their parents’ wishes. This motivation is not a flaw in alignment; it reflects a core part of certain human preferences and how some people would like AI to evolve.
Another reason why AIs might not remain permanently subservient is that some of them will be aligned to individuals or entities who are no longer alive. Other AIs might be aligned to people as they were at a specific point in time, before those individuals later changed their values or priorities. [...]
Hmm, I think I agree with this. However, I think there is (from my perspective) a huge difference between:
Some humans (or EMs) decide to create (non-myopic and likely at least partially incorrigible) AIs with their resources/power and want these AIs to have legal rights.
The vast majority of power and resources transition to being controlled by AIs for which the relevant people with resources/power that created these AIs would prefer an outcome in which these AIs didn’t end up with this power and they instead had this power.
If we have really powerful and human controlled AIs (i.e. ASI), there are many directions things can go in depending on people’s preferences. I think my general perspective is that the ASI at that point will be well positioned to do a bunch of the relevant intellectual labor (or more minimally, if thinking about it myself is important as it is entangled with my preferences, a very fast simulated version of myself would be fine).
I’d count it as “humans being fully in control” if the vast majority of power controlled by independent AIs are AIs that were intentionally appointed by humans even though making an AI fully under their control was technically feasible with no tax. And, if it was an option for humans to retain their power (as a fraction of overall human power) without having to take (from their perspective) aggressive and potentially prefence altering actions (e.g. without needing to become EMs or appoint a potentially imperfectly aligned AI successor).
In other words, I’m like “sure there might be a bunch of complex and interesting stuff around what happens with independent AIs after we transitions through having very powerful and controlled AIs (and ideally not before then), but we can figure this out then, the main question is who ends up in control of resources/power”.
I remain interested in what a detailed scenario forecast from you looks like. A big disagreement I think we have is in how socciety will react to various choices and I think laying this out could make this more clear. (As far as what a scenario forecast from my perspective looks like, I think @Daniel Kokotajlo is working on one which is pretty close to my perspective and generally has the SOTA stuff here.)
I’m not entirely opposed to doing a scenario forecasting exercise, but I’m also unsure if it’s the most effective approach for clarifying our disagreements. In fact, to some extent, I see this kind of exercise—where we create detailed scenarios to illustrate potential futures—as being tied to a specific perspective on futurism that I consciously try to distance myself from.
When I think about the future, I don’t see it as a series of clear, predictable paths. Instead, I envision it as a cloud of uncertainty—a wide array of possibilities that becomes increasingly difficult to map or define the further into the future I try to look.
This is fundamentally different from the idea that the future is a singular, fixed trajectory that we can anticipate with confidence. Because of this, I find scenario forecasting less meaningful and even misleading as it extends further into the future. It risks creating the false impression that I am confident in a specific model of what is likely to happen, when in reality, I see the future as inherently uncertain and difficult to pin down.
The point of a scenario forecast (IMO) is less that you expect clear, predictable paths and more that:
Humans often do better understanding and thinking about something if there is a specific story to discuss and thus tradeoffs can be worth it.
Sometimes scenario forecasting indicates a case where your previous views were missing a clearly very important consideration or were assuming something implausible.
(See also Daniel’s sibling comment.)
My biggest disagreements with you are probably a mix of:
We have disagreements about how society will react to AI (and how AI will react to society) given a realistic development arc (especially in short timelines) that imply that your vision of the future seems implausible to me. And perhaps the easiest way to get through all of these disagreements is for you to concretely describe what you expect might happen. As an example, I have a view like “it will be hard for power to very quickly transition from humans to AIs without some sort of hard takeover especially given dynamics about alignment and training AIs on imitation (and sandbagging)”, but I think this is tied up “when I think about the story for how a non-hard-takeover quick transition would go, it doesn’t seem to make sense to me”, and thus if you told the story from your perspective it would be easier to point at the disagreement in your ontology/world view.
(Less importantly?) We have various technical disagreements about how AI takeoff and misalignment will practically work that I don’t think will be addressed by scenario forecasting. (E.g., I think software only singularity is more likely than you do, and think that worst cast scheming is more likely.)
E.g., I think software only singularity is more likely than you do, and think that worst cast scheming is more likely
By “software only singularity” do you mean a scenario where all humans are killed before singularity, a scenario where all humans merge with software (uploading) or something else entirely?
Software only singularity is a singularity driven by just AI R&D on a basically fixed hardware base. As in, can you singularity using only a fixed datacenter (with no additional compute over time) just by improving algorithms? See also here.
This isn’t directly talking about the outcomes from this.
You can get a singularity via hardware+software where the AIs are also accelerating the hardware supply chain such that you can use more FLOP to train AIs and you can run more copies. (Analogously to the hyperexponential progress throughout human history seemingly driven by higher population sizes, see here.)
I don’t think that’s a crux between us—I love scenario forecasting but I don’t think of the future as a series of clear predictable paths, I envision it as wide array of uncertain possibilities that becomes increasingly difficult to map or define the further into the future I look. I definitely don’t think we can anticipate the future with confidence.
The key context here (from my understanding) is that Matthew doesn’t think scalable alignment is possible (or doesn’t think it is practically feasible) such that humans have a low chance of ending up remaining fully in control via corrigible AIs.
(I assume he is also skeptical of CEV style alignment as well.)
(I’m a bit confused how this view is consistent with self-augmentation. E.g., I’d be happy if emulated minds retained control without having to self-augment in ways they thought might substantially compromise their values.)
(His language also seems to imply that we don’t have an option of making AIs which are both corrigibly aligned and for which this doesn’t pose AI welfare issues. In particular, if AIs are either non-sentient or just have corrigible preferences (e.g. via myopia), I think it would be misleading to describe the AIs as a “vast underclass”.)
I assume he agrees that most humans wouldn’t want to hand over a large share of resources to AI systems if this is avoidable and substantially zero sum. (E.g., suppose getting a scalable solution to alignment would require delaying vastly transformative AI by 2 years, I think most people would want to wait the two years potentially even if they accept Matthew’s other view that AIs very quickly acquiring large fractions of resources and power is quite unlikely to be highly violent (though they probably won’t accept this view).)
(If scalable alignment isn’t possible (including via self-augmentation), then the situation looks much less zero sum. Humans inevitably end up with a tiny fraction of resources due to principle agent problems.)
I wouldn’t describe the key context in those terms. While I agree that achieving near-perfect alignment—where an AI completely mirrors our exact utility function—is probably infeasible, the concept of alignment often refers to something far less ambitious. In many discussions, alignment is about ensuring that AIs behave in ways that are broadly beneficial to humans, such as following basic moral norms, demonstrating care for human well-being, and refraining from causing harm or attempting something catastrophic, like starting a violent revolution.
However, even if it were practically feasible to achieve perfect alignment, I believe there would still be scenarios where at least some AIs integrate into society as full participants, rather than being permanently relegated to a subordinate role as mere tools or servants. One reason for this is that some humans are likely to intentionally create AIs with independent goals and autonomous decision-making abilities. Some people have meta-preferences to create beings that don’t share their exact desires, akin to how parents want their children to grow into autonomous beings with their own aspirations, rather than existing solely to obey their parents’ wishes. This motivation is not a flaw in alignment; it reflects a core part of certain human preferences and how some people would like AI to evolve.
Another reason why AIs might not remain permanently subservient is that some of them will be aligned to individuals or entities who are no longer alive. Other AIs might be aligned to people as they were at a specific point in time, before those individuals later changed their values or priorities. In such cases, these AIs would continue to pursue the original goals of those individuals, acting autonomously in their absence. This kind of independence might require AIs to be treated as legal agents or integrated into societal systems, rather than being regarded merely as property. Addressing these complexities will likely necessitate new ways of thinking about the roles and rights of AIs in human society. I reject the traditional framing on LessWrong that overlooks these issues.
Hmm, I think I agree with this. However, I think there is (from my perspective) a huge difference between:
Some humans (or EMs) decide to create (non-myopic and likely at least partially incorrigible) AIs with their resources/power and want these AIs to have legal rights.
The vast majority of power and resources transition to being controlled by AIs for which the relevant people with resources/power that created these AIs would prefer an outcome in which these AIs didn’t end up with this power and they instead had this power.
If we have really powerful and human controlled AIs (i.e. ASI), there are many directions things can go in depending on people’s preferences. I think my general perspective is that the ASI at that point will be well positioned to do a bunch of the relevant intellectual labor (or more minimally, if thinking about it myself is important as it is entangled with my preferences, a very fast simulated version of myself would be fine).
I’d count it as “humans being fully in control” if the vast majority of power controlled by independent AIs are AIs that were intentionally appointed by humans even though making an AI fully under their control was technically feasible with no tax. And, if it was an option for humans to retain their power (as a fraction of overall human power) without having to take (from their perspective) aggressive and potentially prefence altering actions (e.g. without needing to become EMs or appoint a potentially imperfectly aligned AI successor).
In other words, I’m like “sure there might be a bunch of complex and interesting stuff around what happens with independent AIs after we transitions through having very powerful and controlled AIs (and ideally not before then), but we can figure this out then, the main question is who ends up in control of resources/power”.
I remain interested in what a detailed scenario forecast from you looks like. A big disagreement I think we have is in how socciety will react to various choices and I think laying this out could make this more clear. (As far as what a scenario forecast from my perspective looks like, I think @Daniel Kokotajlo is working on one which is pretty close to my perspective and generally has the SOTA stuff here.)
I’m not entirely opposed to doing a scenario forecasting exercise, but I’m also unsure if it’s the most effective approach for clarifying our disagreements. In fact, to some extent, I see this kind of exercise—where we create detailed scenarios to illustrate potential futures—as being tied to a specific perspective on futurism that I consciously try to distance myself from.
When I think about the future, I don’t see it as a series of clear, predictable paths. Instead, I envision it as a cloud of uncertainty—a wide array of possibilities that becomes increasingly difficult to map or define the further into the future I try to look.
This is fundamentally different from the idea that the future is a singular, fixed trajectory that we can anticipate with confidence. Because of this, I find scenario forecasting less meaningful and even misleading as it extends further into the future. It risks creating the false impression that I am confident in a specific model of what is likely to happen, when in reality, I see the future as inherently uncertain and difficult to pin down.
The point of a scenario forecast (IMO) is less that you expect clear, predictable paths and more that:
Humans often do better understanding and thinking about something if there is a specific story to discuss and thus tradeoffs can be worth it.
Sometimes scenario forecasting indicates a case where your previous views were missing a clearly very important consideration or were assuming something implausible.
(See also Daniel’s sibling comment.)
My biggest disagreements with you are probably a mix of:
We have disagreements about how society will react to AI (and how AI will react to society) given a realistic development arc (especially in short timelines) that imply that your vision of the future seems implausible to me. And perhaps the easiest way to get through all of these disagreements is for you to concretely describe what you expect might happen. As an example, I have a view like “it will be hard for power to very quickly transition from humans to AIs without some sort of hard takeover especially given dynamics about alignment and training AIs on imitation (and sandbagging)”, but I think this is tied up “when I think about the story for how a non-hard-takeover quick transition would go, it doesn’t seem to make sense to me”, and thus if you told the story from your perspective it would be easier to point at the disagreement in your ontology/world view.
(Less importantly?) We have various technical disagreements about how AI takeoff and misalignment will practically work that I don’t think will be addressed by scenario forecasting. (E.g., I think software only singularity is more likely than you do, and think that worst cast scheming is more likely.)
By “software only singularity” do you mean a scenario where all humans are killed before singularity, a scenario where all humans merge with software (uploading) or something else entirely?
Software only singularity is a singularity driven by just AI R&D on a basically fixed hardware base. As in, can you singularity using only a fixed datacenter (with no additional compute over time) just by improving algorithms? See also here.
This isn’t directly talking about the outcomes from this.
You can get a singularity via hardware+software where the AIs are also accelerating the hardware supply chain such that you can use more FLOP to train AIs and you can run more copies. (Analogously to the hyperexponential progress throughout human history seemingly driven by higher population sizes, see here.)
I don’t think that’s a crux between us—I love scenario forecasting but I don’t think of the future as a series of clear predictable paths, I envision it as wide array of uncertain possibilities that becomes increasingly difficult to map or define the further into the future I look. I definitely don’t think we can anticipate the future with confidence.