And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)
Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)
Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?
Relatedly, are we sure that CEV is computable?
I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.