I think a big part of the issue is not just the assumptions people use, but also because your scenario doesn’t really lead to existential catastrophe in most worlds, if only because a few very augmented humans determine a lot of what the future does hold, at least under single-single alignment scenarios, and a lot of AI thought has been directed towards worlds where AI does do existential risk, and a lot of this is because of the values of the first thinkers on the topic.
@the gears to ascension is there a plausible scenario in your mind where the gradual disempowerment leads to the death/very bad fates for all humans?
Because I’m currently struggling to understand the perspective where alignment is solved, but all humans still die/irreversibly lose control due to gradually being disempowered.
A key part of the challenge is that you must construct the scenario in a world where the single-single alignment problem/classic alignment problem as envisioned by LW is basically solved for all intents and purposes.
“all” humans? like, maybe no, I expect a few would survive, but the future wouldn’t be human, it’d be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people’s motivation, etc. if you solve “make an ai be entirely obedient to a single person”, then that person needs to be wise enough to not screw that up, and I trust exactly no one to even successfully use that situation to do what they want, nevermind what others around them want. For an evocative cariacature of the intuition here, see rick sanchez.
The vast majority of actual humans are already dead. The overwhelming majority of currently-living humans should expect 95%+ chance they’ll die in under a century.
If immortality is solved, it will only apply to “that distorted thing those humans turn into”. Note that this is something the stereotypical Victorian would understand completely—there may be biological similarities with today’s humans, but they’re culturally a different species.
I mean, we’re not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you’d have had if you didn’t rush yourself.
I believe there’s also a disagreement here where the same scenario will be considered fine by some and very bad by others (humans as happy pets comes to mind).
To be clear, I’m expecting scenarios much more clearly bad than that, like “the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn’t manage to break out of it using that AI, and then got more weird in rapid jumps thanks to the intense things they asked for help with.”
like, the general pattern here being, the crucible of competition tends to beat out of you whatever it was you wanted to compete to get, and suddenly getting a huge windfall of a type you have little experience with that puts you in a new realm of possibility will tend to get massively underused and not end up managing to solve subtle problems.
Nothing like, “oh yeah humanity generally survived and will be kept around indefinitely without significant suffering”.
My main crux here is I think that no strong AI rights will likely be given before near-full alignment to one person is achieved, and maybe not even then, and a lot of the failure modes of giving AIs power in gradual disempowerment scenario fundamentally route through giving AIs very strong rights, but thankfully, this is disincentivized by default, because otherwise AIs would be more expensive.
The main way this changes the scenario is that the 6 humans here remain broadly in control here, and aren’t just high all the time, and the first one probably doesn’t just replace their preferences with pure growth, because at the level of billionaires, status dominates, so they are likely living very rich lives with their own servants.
No guarantees about anyone else surviving though:
No strong AI rights before full alignment: There won’t be a powerful society that gives extremely productive AIs “human-like rights” (and in particular strong property rights) prior to being relatively confident that AIs are aligned to human values.
I think it’s plausible that fully AI-run entities are given the same status as companies—but I expect that the surplus they generate will remain owned by some humans throughout the relevant transition period.
I also think it’s plausible that some weak entities will give AIs these rights, but that this won’t matter because most “AI power” will be controlled by humans that care about it remaining the case as long as we don’t have full alignment.
I think a big part of the issue is not just the assumptions people use, but also because your scenario doesn’t really lead to existential catastrophe in most worlds, if only because a few very augmented humans determine a lot of what the future does hold, at least under single-single alignment scenarios, and a lot of AI thought has been directed towards worlds where AI does do existential risk, and a lot of this is because of the values of the first thinkers on the topic.
More below:
https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from#GChLyapXkhuHaBewq
@the gears to ascension is there a plausible scenario in your mind where the gradual disempowerment leads to the death/very bad fates for all humans?
Because I’m currently struggling to understand the perspective where alignment is solved, but all humans still die/irreversibly lose control due to gradually being disempowered.
A key part of the challenge is that you must construct the scenario in a world where the single-single alignment problem/classic alignment problem as envisioned by LW is basically solved for all intents and purposes.
“all” humans? like, maybe no, I expect a few would survive, but the future wouldn’t be human, it’d be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people’s motivation, etc. if you solve “make an ai be entirely obedient to a single person”, then that person needs to be wise enough to not screw that up, and I trust exactly no one to even successfully use that situation to do what they want, nevermind what others around them want. For an evocative cariacature of the intuition here, see rick sanchez.
The vast majority of actual humans are already dead. The overwhelming majority of currently-living humans should expect 95%+ chance they’ll die in under a century.
If immortality is solved, it will only apply to “that distorted thing those humans turn into”. Note that this is something the stereotypical Victorian would understand completely—there may be biological similarities with today’s humans, but they’re culturally a different species.
I mean, we’re not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you’d have had if you didn’t rush yourself.
I believe there’s also a disagreement here where the same scenario will be considered fine by some and very bad by others (humans as happy pets comes to mind).
To be clear, I’m expecting scenarios much more clearly bad than that, like “the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn’t manage to break out of it using that AI, and then got more weird in rapid jumps thanks to the intense things they asked for help with.”
like, the general pattern here being, the crucible of competition tends to beat out of you whatever it was you wanted to compete to get, and suddenly getting a huge windfall of a type you have little experience with that puts you in a new realm of possibility will tend to get massively underused and not end up managing to solve subtle problems.
Nothing like, “oh yeah humanity generally survived and will be kept around indefinitely without significant suffering”.
My main crux here is I think that no strong AI rights will likely be given before near-full alignment to one person is achieved, and maybe not even then, and a lot of the failure modes of giving AIs power in gradual disempowerment scenario fundamentally route through giving AIs very strong rights, but thankfully, this is disincentivized by default, because otherwise AIs would be more expensive.
The main way this changes the scenario is that the 6 humans here remain broadly in control here, and aren’t just high all the time, and the first one probably doesn’t just replace their preferences with pure growth, because at the level of billionaires, status dominates, so they are likely living very rich lives with their own servants.
No guarantees about anyone else surviving though: