Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
This isn’t really a problem with alignment so there’s no need to address it here. Alignment means the transmission of a preference ordering to an action sequence. Lacking a coherent preference ordering for states of the universe (or histories, for that matter) is not an alignment problem.
I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
Why?
Because humans have incoherent preferences, and it’s unclear whether a universal resolution procedure is achievable. I like how Richard Ngo put it, “there’s no canonical way to scale me up”.
This isn’t really a problem with alignment so there’s no need to address it here. Alignment means the transmission of a preference ordering to an action sequence. Lacking a coherent preference ordering for states of the universe (or histories, for that matter) is not an alignment problem.
I’d rather put it that resolving that problem is a prerequisite for the notion of “alignment problem” to be meaningful in the first place. It’s not technically a contradiction to have an “aligned” superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.
you can have an alignment problem without humans. E.g. two strawberries problem.