It’s totally possible I missed it, but does this report touch on the question of whether power-seeking AIs are an existential risk, or does it just touch on the questions of whether future AIs will have misaligned goals and will be power-seeking in the first place?
In my opinion, there’s quite a big leap from “Misaligned AIs will seek power” to “Misaligned AI is an existential risk”. Let me give an analogy to help explain what I mean.
Suppose we were asking whether genetically engineered humans are an existential risk. We can ask:
Will some genetically engineered humans have misaligned goals? The answer here is almost certainly yes.
If by “misaligned” all we mean is that some of them have goals that are not identical to the goals of the rest of humanity, then the answer is obviously yes. Individuals routinely have indexical goals (such as money for themselves, status for themselves, taking care of family) that are not what the rest of humanity wants.
If by “misaligned” what we mean is that some of them are “evil” i.e., they want to cause destruction or suffering on purpose, and not merely as a means to an end, then the answer here is presumably also yes, although it’s less certain.
Will some genetically engineered humans seek power? Presumably, also yes.
After answering these questions, did we answer the original question of “Are genetically engineered humans are an existential risk?” I’d argue no, because even if some genetically engineered humans have misaligned goals, and seek power, and even if they’re smarter, more well-coordinated than non-genetically engineered humans, it’s still highly questionable whether they’d kill all the non-genetically engineered humans in pursuit of these goals. This premise needs to be justified, and in my opinion, it’s what holds up ~the entire argument here.
I’d argue no, because even if some genetically engineered humans have misaligned goals, and seek power, and even if they’re smarter, more well-coordinated than non-genetically engineered humans, it’s still highly questionable whether they’d kill all the non-genetically engineered humans in pursuit of these goals.
1. Wanna spell out the reasons why? (a) They’d be resisted by good gengineered humans, (b) they might be misaligned but not in ways that make them want to kill everyone, (c) they might not be THAT much smarter, such that they can evade the system of laws and power-distribution meant to stop small groups from killing everyone. Anything I missed?
2. Existential risk =/= everyone dead. That’s just the central example. Permanent dystopia is also an existential risk, as is sufficiently big (and unjustified, and irreversible) value drift.
I think Matthew’s view is mostly spelled out in this comment and also in a few more comments on his shortform on the EA forum.
TLDR: his view is that very powerful (and even coordinated) misaligned entities that want resources would end up with almost all the resources (e.g. >99%), but this likely wouldn’t involve violence.
Existential risk =/= everyone dead. That’s just the central example. Permanent dystopia is also an existential risk, as is sufficiently big (and unjustified, and irreversible) value drift.
I think the above situation I described (no violence but >99% of resources owned by misaligned entities) would still count as existential risk from a conventional longtermist perspective, but awkwardly the definition of existential risk depends on a notion of value, in particular what counts as “substantially curtailing potential goodness”.
Whether or not you think that humanity only getting 0.1% of resources is “substantially curtailing total goodness” depends on other philosophical views.
I think it’s worth tabooing this word in this context for this reason.
(I disagree with Matthew about the chance of violence and also about how bad it is to cede 99% of resources.)
In answer to “It’s totally possible I missed it, but does this report touch on the question of whether power-seeking AIs are an existential risk, or does it just touch on the questions of whether future AIs will have misaligned goals and will be power-seeking in the first place?”:
No, the report doesn’t directly explore whether power-seeking = existential risk
I wrote the report more in the mode of ‘many arguments for existential risk depend on power-seeking (and also other things). Let’s see what the empirical evidence for power-seeking is like (as it’s one, though not the only, prereq for a class of existential risk arguments’
Basically the report has a reasonably limited scope (but I think it’s still worth gathering the evidence for this more constrained thing)
Will some genetically engineered humans have misaligned goals? The answer here is almost certainly yes.
If by “misaligned” all we mean is that some of them have goals that are not identical to the goals of the rest of humanity, then the answer is obviously yes. Individuals routinely have indexical goals (such as money for themselves, status for themselves, taking care of family) that are not what the rest of humanity wants.
If by “misaligned” what we mean is that some of them are “evil” i.e., they want to cause destruction or suffering on purpose, and not merely as a means to an end, then the answer here is presumably also yes, although it’s less certain.
This is very strange reasoning. Misaligned goals mean that entity basically doesn’t care about our existence or well-being, it doesn’t gain anything from us being alive and well relatively to us turning into paperclips. For genetically engineered humans reversal is very likely to be true: they are going to love other humans, be friends with them or take pride from position in human social hierarchy, even if they are selfish by human standards, and it is not clear why they should be selfish.
It’s totally possible I missed it, but does this report touch on the question of whether power-seeking AIs are an existential risk, or does it just touch on the questions of whether future AIs will have misaligned goals and will be power-seeking in the first place?
In my opinion, there’s quite a big leap from “Misaligned AIs will seek power” to “Misaligned AI is an existential risk”. Let me give an analogy to help explain what I mean.
Suppose we were asking whether genetically engineered humans are an existential risk. We can ask:
Will some genetically engineered humans have misaligned goals? The answer here is almost certainly yes.
If by “misaligned” all we mean is that some of them have goals that are not identical to the goals of the rest of humanity, then the answer is obviously yes. Individuals routinely have indexical goals (such as money for themselves, status for themselves, taking care of family) that are not what the rest of humanity wants.
If by “misaligned” what we mean is that some of them are “evil” i.e., they want to cause destruction or suffering on purpose, and not merely as a means to an end, then the answer here is presumably also yes, although it’s less certain.
Will some genetically engineered humans seek power? Presumably, also yes.
After answering these questions, did we answer the original question of “Are genetically engineered humans are an existential risk?” I’d argue no, because even if some genetically engineered humans have misaligned goals, and seek power, and even if they’re smarter, more well-coordinated than non-genetically engineered humans, it’s still highly questionable whether they’d kill all the non-genetically engineered humans in pursuit of these goals. This premise needs to be justified, and in my opinion, it’s what holds up ~the entire argument here.
1. Wanna spell out the reasons why? (a) They’d be resisted by good gengineered humans, (b) they might be misaligned but not in ways that make them want to kill everyone, (c) they might not be THAT much smarter, such that they can evade the system of laws and power-distribution meant to stop small groups from killing everyone. Anything I missed?
2. Existential risk =/= everyone dead. That’s just the central example. Permanent dystopia is also an existential risk, as is sufficiently big (and unjustified, and irreversible) value drift.
I think Matthew’s view is mostly spelled out in this comment and also in a few more comments on his shortform on the EA forum.
TLDR: his view is that very powerful (and even coordinated) misaligned entities that want resources would end up with almost all the resources (e.g. >99%), but this likely wouldn’t involve violence.
I think the above situation I described (no violence but >99% of resources owned by misaligned entities) would still count as existential risk from a conventional longtermist perspective, but awkwardly the definition of existential risk depends on a notion of value, in particular what counts as “substantially curtailing potential goodness”.
Whether or not you think that humanity only getting 0.1% of resources is “substantially curtailing total goodness” depends on other philosophical views.
I think it’s worth tabooing this word in this context for this reason.
(I disagree with Matthew about the chance of violence and also about how bad it is to cede 99% of resources.)
In answer to “It’s totally possible I missed it, but does this report touch on the question of whether power-seeking AIs are an existential risk, or does it just touch on the questions of whether future AIs will have misaligned goals and will be power-seeking in the first place?”:
No, the report doesn’t directly explore whether power-seeking = existential risk
I wrote the report more in the mode of ‘many arguments for existential risk depend on power-seeking (and also other things). Let’s see what the empirical evidence for power-seeking is like (as it’s one, though not the only, prereq for a class of existential risk arguments’
Basically the report has a reasonably limited scope (but I think it’s still worth gathering the evidence for this more constrained thing)
This is very strange reasoning. Misaligned goals mean that entity basically doesn’t care about our existence or well-being, it doesn’t gain anything from us being alive and well relatively to us turning into paperclips. For genetically engineered humans reversal is very likely to be true: they are going to love other humans, be friends with them or take pride from position in human social hierarchy, even if they are selfish by human standards, and it is not clear why they should be selfish.