The key context here (from my understanding) is that Matthew doesn’t think scalable alignment is possible (or doesn’t think it is practically feasible) so that humans have a low chance of ending up remaining fully in control via corrigible AIs.
I wouldn’t describe the key context in those terms. While I agree that achieving near-perfect alignment—where an AI completely mirrors our exact utility function—is probably infeasible, the concept of alignment often refers to something far less ambitious. In many discussions, alignment is about ensuring that AIs behave in ways that are broadly beneficial to humans, such as following basic moral norms, demonstrating care for human well-being, and refraining from causing harm or attempting something catastrophic, like starting a violent revolution.
However, even if it were practically feasible to achieve perfect alignment, I believe there would still be scenarios where at least some AIs integrate into society as full participants, rather than being permanently relegated to a subordinate role as mere tools or servants. One reason for this is that some humans are likely to intentionally create AIs with independent goals and autonomous decision-making abilities. Some people have meta-preferences to create beings that don’t share their exact desires, akin to how parents want their children to grow into autonomous beings with their own aspirations, rather than existing solely to obey their parents’ wishes. This motivation is not a flaw in alignment; it reflects a core part of certain human preferences and how some people would like AI to evolve.
Another reason why AIs might not remain permanently subservient is that some of them will be aligned to individuals or entities who are no longer alive. Other AIs might be aligned to people as they were at a specific point in time, before those individuals later changed their values or priorities. In such cases, these AIs would continue to pursue the original goals of those individuals, acting autonomously in their absence. This kind of independence might require AIs to be treated as legal agents or integrated into societal systems, rather than being regarded merely as property. Addressing these complexities will likely necessitate new ways of thinking about the roles and rights of AIs in human society. I reject the traditional framing on LessWrong that overlooks these issues.
However, even if it were practically feasible to achieve perfect alignment, I believe there would still be scenarios where AIs integrate into society as full participants, rather than being permanently relegated to a subordinate role as mere tools or servants. One reason for this is that some humans are likely to intentionally create AIs with independent goals and autonomous decision-making abilities. Some people have meta-preferences to create beings that don’t share their exact desires, akin to how parents want their children to grow into autonomous beings with their own aspirations, rather than existing solely to obey their parents’ wishes. This motivation is not a flaw in alignment; it reflects a core part of certain human preferences and how some people would like AI to evolve.
Another reason why AIs might not remain permanently subservient is that some of them will be aligned to individuals or entities who are no longer alive. Other AIs might be aligned to people as they were at a specific point in time, before those individuals later changed their values or priorities. [...]
Hmm, I think I agree with this. However, I think there is (from my perspective) a huge difference between:
Some humans (or EMs) decide to create (non-myopic and likely at least partially incorrigible) AIs with their resources/power and want these AIs to have legal rights.
The vast majority of power and resources transition to being controlled by AIs for which the relevant people with resources/power that created these AIs would prefer an outcome in which these AIs didn’t end up with this power and they instead had this power.
If we have really powerful and human controlled AIs (i.e. ASI), there are many directions things can go in depending on people’s preferences. I think my general perspective is that the ASI at that point will be well positioned to do a bunch of the relevant intellectual labor (or more minimally, if thinking about it myself is important as it is entangled with my preferences, a very fast simulated version of myself would be fine).
I’d count it as “humans being fully in control” if the vast majority of power controlled by independent AIs are AIs that were intentionally appointed by humans even though making an AI fully under their control was technically feasible with no tax. And, if it was an option for humans to retain their power (as a fraction of overall human power) without having to take (from their perspective) aggressive and potentially prefence altering actions (e.g. without needing to become EMs or appoint a potentially imperfectly aligned AI successor).
In other words, I’m like “sure there might be a bunch of complex and interesting stuff around what happens with independent AIs after we transitions through having very powerful and controlled AIs (and ideally not before then), but we can figure this out then, the main question is who ends up in control of resources/power”.
I remain interested in what a detailed scenario forecast from you looks like. A big disagreement I think we have is in how socciety will react to various choices and I think laying this out could make this more clear. (As far as what a scenario forecast from my perspective looks like, I think @Daniel Kokotajlo is working on one which is pretty close to my perspective and generally has the SOTA stuff here.)
I’m not entirely opposed to doing a scenario forecasting exercise, but I’m also unsure if it’s the most effective approach for clarifying our disagreements. In fact, to some extent, I see this kind of exercise—where we create detailed scenarios to illustrate potential futures—as being tied to a specific perspective on futurism that I consciously try to distance myself from.
When I think about the future, I don’t see it as a series of clear, predictable paths. Instead, I envision it as a cloud of uncertainty—a wide array of possibilities that becomes increasingly difficult to map or define the further into the future I try to look.
This is fundamentally different from the idea that the future is a singular, fixed trajectory that we can anticipate with confidence. Because of this, I find scenario forecasting less meaningful and even misleading as it extends further into the future. It risks creating the false impression that I am confident in a specific model of what is likely to happen, when in reality, I see the future as inherently uncertain and difficult to pin down.
The point of a scenario forecast (IMO) is less that you expect clear, predictable paths and more that:
Humans often do better understanding and thinking about something if there is a specific story to discuss and thus tradeoffs can be worth it.
Sometimes scenario forecasting indicates a case where your previous views were missing a clearly very important consideration or were assuming something implausible.
(See also Daniel’s sibling comment.)
My biggest disagreements with you are probably a mix of:
We have disagreements about how society will react to AI (and how AI will react to society) given a realistic development arc (especially in short timelines) that imply that your vision of the future seems implausible to me. And perhaps the easiest way to get through all of these disagreements is for you to concretely describe what you expect might happen. As an example, I have a view like “it will be hard for power to very quickly transition from humans to AIs without some sort of hard takeover especially given dynamics about alignment and training AIs on imitation (and sandbagging)”, but I think this is tied up “when I think about the story for how a non-hard-takeover quick transition would go, it doesn’t seem to make sense to me”, and thus if you told the story from your perspective it would be easier to point at the disagreement in your ontology/world view.
(Less importantly?) We have various technical disagreements about how AI takeoff and misalignment will practically work that I don’t think will be addressed by scenario forecasting. (E.g., I think software only singularity is more likely than you do, and think that worst cast scheming is more likely.)
E.g., I think software only singularity is more likely than you do, and think that worst cast scheming is more likely
By “software only singularity” do you mean a scenario where all humans are killed before singularity, a scenario where all humans merge with software (uploading) or something else entirely?
Software only singularity is a singularity driven by just AI R&D on a basically fixed hardware base. As in, can you singularity using only a fixed datacenter (with no additional compute over time) just by improving algorithms? See also here.
This isn’t directly talking about the outcomes from this.
You can get a singularity via hardware+software where the AIs are also accelerating the hardware supply chain such that you can use more FLOP to train AIs and you can run more copies. (Analogously to the hyperexponential progress throughout human history seemingly driven by higher population sizes, see here.)
I don’t think that’s a crux between us—I love scenario forecasting but I don’t think of the future as a series of clear predictable paths, I envision it as wide array of uncertain possibilities that becomes increasingly difficult to map or define the further into the future I look. I definitely don’t think we can anticipate the future with confidence.
I wouldn’t describe the key context in those terms. While I agree that achieving near-perfect alignment—where an AI completely mirrors our exact utility function—is probably infeasible, the concept of alignment often refers to something far less ambitious. In many discussions, alignment is about ensuring that AIs behave in ways that are broadly beneficial to humans, such as following basic moral norms, demonstrating care for human well-being, and refraining from causing harm or attempting something catastrophic, like starting a violent revolution.
However, even if it were practically feasible to achieve perfect alignment, I believe there would still be scenarios where at least some AIs integrate into society as full participants, rather than being permanently relegated to a subordinate role as mere tools or servants. One reason for this is that some humans are likely to intentionally create AIs with independent goals and autonomous decision-making abilities. Some people have meta-preferences to create beings that don’t share their exact desires, akin to how parents want their children to grow into autonomous beings with their own aspirations, rather than existing solely to obey their parents’ wishes. This motivation is not a flaw in alignment; it reflects a core part of certain human preferences and how some people would like AI to evolve.
Another reason why AIs might not remain permanently subservient is that some of them will be aligned to individuals or entities who are no longer alive. Other AIs might be aligned to people as they were at a specific point in time, before those individuals later changed their values or priorities. In such cases, these AIs would continue to pursue the original goals of those individuals, acting autonomously in their absence. This kind of independence might require AIs to be treated as legal agents or integrated into societal systems, rather than being regarded merely as property. Addressing these complexities will likely necessitate new ways of thinking about the roles and rights of AIs in human society. I reject the traditional framing on LessWrong that overlooks these issues.
Hmm, I think I agree with this. However, I think there is (from my perspective) a huge difference between:
Some humans (or EMs) decide to create (non-myopic and likely at least partially incorrigible) AIs with their resources/power and want these AIs to have legal rights.
The vast majority of power and resources transition to being controlled by AIs for which the relevant people with resources/power that created these AIs would prefer an outcome in which these AIs didn’t end up with this power and they instead had this power.
If we have really powerful and human controlled AIs (i.e. ASI), there are many directions things can go in depending on people’s preferences. I think my general perspective is that the ASI at that point will be well positioned to do a bunch of the relevant intellectual labor (or more minimally, if thinking about it myself is important as it is entangled with my preferences, a very fast simulated version of myself would be fine).
I’d count it as “humans being fully in control” if the vast majority of power controlled by independent AIs are AIs that were intentionally appointed by humans even though making an AI fully under their control was technically feasible with no tax. And, if it was an option for humans to retain their power (as a fraction of overall human power) without having to take (from their perspective) aggressive and potentially prefence altering actions (e.g. without needing to become EMs or appoint a potentially imperfectly aligned AI successor).
In other words, I’m like “sure there might be a bunch of complex and interesting stuff around what happens with independent AIs after we transitions through having very powerful and controlled AIs (and ideally not before then), but we can figure this out then, the main question is who ends up in control of resources/power”.
I remain interested in what a detailed scenario forecast from you looks like. A big disagreement I think we have is in how socciety will react to various choices and I think laying this out could make this more clear. (As far as what a scenario forecast from my perspective looks like, I think @Daniel Kokotajlo is working on one which is pretty close to my perspective and generally has the SOTA stuff here.)
I’m not entirely opposed to doing a scenario forecasting exercise, but I’m also unsure if it’s the most effective approach for clarifying our disagreements. In fact, to some extent, I see this kind of exercise—where we create detailed scenarios to illustrate potential futures—as being tied to a specific perspective on futurism that I consciously try to distance myself from.
When I think about the future, I don’t see it as a series of clear, predictable paths. Instead, I envision it as a cloud of uncertainty—a wide array of possibilities that becomes increasingly difficult to map or define the further into the future I try to look.
This is fundamentally different from the idea that the future is a singular, fixed trajectory that we can anticipate with confidence. Because of this, I find scenario forecasting less meaningful and even misleading as it extends further into the future. It risks creating the false impression that I am confident in a specific model of what is likely to happen, when in reality, I see the future as inherently uncertain and difficult to pin down.
The point of a scenario forecast (IMO) is less that you expect clear, predictable paths and more that:
Humans often do better understanding and thinking about something if there is a specific story to discuss and thus tradeoffs can be worth it.
Sometimes scenario forecasting indicates a case where your previous views were missing a clearly very important consideration or were assuming something implausible.
(See also Daniel’s sibling comment.)
My biggest disagreements with you are probably a mix of:
We have disagreements about how society will react to AI (and how AI will react to society) given a realistic development arc (especially in short timelines) that imply that your vision of the future seems implausible to me. And perhaps the easiest way to get through all of these disagreements is for you to concretely describe what you expect might happen. As an example, I have a view like “it will be hard for power to very quickly transition from humans to AIs without some sort of hard takeover especially given dynamics about alignment and training AIs on imitation (and sandbagging)”, but I think this is tied up “when I think about the story for how a non-hard-takeover quick transition would go, it doesn’t seem to make sense to me”, and thus if you told the story from your perspective it would be easier to point at the disagreement in your ontology/world view.
(Less importantly?) We have various technical disagreements about how AI takeoff and misalignment will practically work that I don’t think will be addressed by scenario forecasting. (E.g., I think software only singularity is more likely than you do, and think that worst cast scheming is more likely.)
By “software only singularity” do you mean a scenario where all humans are killed before singularity, a scenario where all humans merge with software (uploading) or something else entirely?
Software only singularity is a singularity driven by just AI R&D on a basically fixed hardware base. As in, can you singularity using only a fixed datacenter (with no additional compute over time) just by improving algorithms? See also here.
This isn’t directly talking about the outcomes from this.
You can get a singularity via hardware+software where the AIs are also accelerating the hardware supply chain such that you can use more FLOP to train AIs and you can run more copies. (Analogously to the hyperexponential progress throughout human history seemingly driven by higher population sizes, see here.)
I don’t think that’s a crux between us—I love scenario forecasting but I don’t think of the future as a series of clear predictable paths, I envision it as wide array of uncertain possibilities that becomes increasingly difficult to map or define the further into the future I look. I definitely don’t think we can anticipate the future with confidence.