This is not a problem for my argument. I am merely showing that any state reachable by humans, must also be reachable by AIs. It is fine if AIs can reach more states.
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
This isn’t really a problem with alignment so there’s no need to address it here. Alignment means the transmission of a preference ordering to an action sequence. Lacking a coherent preference ordering for states of the universe (or histories, for that matter) is not an alignment problem.
I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
This is not a problem for my argument. I am merely showing that any state reachable by humans, must also be reachable by AIs. It is fine if AIs can reach more states.
Hmm, right. You only need assume that there are coherent reachable desirable outcomes. I’m doubtful that such an assumption holds, but most people probably aren’t.
Why?
Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
This isn’t really a problem with alignment so there’s no need to address it here. Alignment means the transmission of a preference ordering to an action sequence. Lacking a coherent preference ordering for states of the universe (or histories, for that matter) is not an alignment problem.
I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
you can have an alignment problem without humans. E.g. two strawberries problem.