We’ll say that a state is in fact reachable if a group of humans could in principle take actions with actuators—hands, vocal chords, etc—that could realize that state.
The main issue here is that groups of humans may in principle be capable of great many things, but there’s a vast chasm between “in principle” and “in practice”. A superintelligence worthy of the name would likely be able to come up with plans that we wouldn’t in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
This is not a problem for my argument. I am merely showing that any state reachable by humans, must also be reachable by AIs. It is fine if AIs can reach more states.
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
This isn’t really a problem with alignment so there’s no need to address it here. Alignment means the transmission of a preference ordering to an action sequence. Lacking a coherent preference ordering for states of the universe (or histories, for that matter) is not an alignment problem.
I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
The main issue here is that groups of humans may in principle be capable of great many things, but there’s a vast chasm between “in principle” and “in practice”. A superintelligence worthy of the name would likely be able to come up with plans that we wouldn’t in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
This is not a problem for my argument. I am merely showing that any state reachable by humans, must also be reachable by AIs. It is fine if AIs can reach more states.
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
Why?
Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
This isn’t really a problem with alignment so there’s no need to address it here. Alignment means the transmission of a preference ordering to an action sequence. Lacking a coherent preference ordering for states of the universe (or histories, for that matter) is not an alignment problem.
I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
you can have an alignment problem without humans. E.g. two strawberries problem.