We’ll say that a state is in fact reachable if a group of humans could in principle take actions with actuators—hands, vocal chords, etc—that could realize that state.
The main issue here is that groups of humans may in principle be capable of great many things, but there’s a vast chasm between “in principle” and “in practice”. A superintelligence worthy of the name would likely be able to come up with plans that we wouldn’t in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
This is not a problem for my argument. I am merely showing that any state reachable by humans, must also be reachable by AIs. It is fine if AIs can reach more states.
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
The main issue here is that groups of humans may in principle be capable of great many things, but there’s a vast chasm between “in principle” and “in practice”. A superintelligence worthy of the name would likely be able to come up with plans that we wouldn’t in practice be able to even check exhaustively, which is the sort of issue that we want alignment for.
This is not a problem for my argument. I am merely showing that any state reachable by humans, must also be reachable by AIs. It is fine if AIs can reach more states.
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
Why?
Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.