More importantly, if we have some one value, that values are to be valued, so much as to enact for, not only to want them—then we have a value which has no opposite in utilitarianism.
this observation means, if we align to mere values of humanity: AI can simply modify the humans, so to alter their values and call it a win; AI aligns you to AI. In general, for fulfillment of any human value, to make the human value it, seems absolutely the easiest, for any case.
here “autonomy”, “responsibility”, “self-determination” are all related values (or maybe closer to drives?) that counter this approach. put simply, “people don’t like being told what to do”. if an effective AI achieves alignment via this approach, i would expect it to take a low-impedance path where there’s no “forceful” value modification, coercion is done by subtler reshaping of the costs/benefits any time humans make value tradeoffs.
e.g. if a clever AI wanted humans to “value” pacifism, it might think to give a high cost to large-scale violence, which it could do by leaking the technology for a global communications network, then for an on-demand translation systems between all human languages, then for highly efficient wind power/sail design, and before you know it both the social and economic costs to large-scale violence is enormous and people “decide” that they “value” peaceful coexistence.
i’m not saying today’s global trade system is a result of AI… but there are so many points of leverage here that if it (or some future system like it) were, would we know?
if we wanted to avoid this type of value modification, we would need to commit to a value system that never changes. write these down on clay tablets that could be preserved in museums in their original form, keep the language of these historic texts alive via rituals and tradition, and encourage people to have faith in the ideas proposed by these ancients. you could make a religion out of this. and its strongest meta-value would necessarily be one of extreme conservatism, a resistance to change.
sounds a little like Preference Utilitarianism.
here “autonomy”, “responsibility”, “self-determination” are all related values (or maybe closer to drives?) that counter this approach. put simply, “people don’t like being told what to do”. if an effective AI achieves alignment via this approach, i would expect it to take a low-impedance path where there’s no “forceful” value modification, coercion is done by subtler reshaping of the costs/benefits any time humans make value tradeoffs.
e.g. if a clever AI wanted humans to “value” pacifism, it might think to give a high cost to large-scale violence, which it could do by leaking the technology for a global communications network, then for an on-demand translation systems between all human languages, then for highly efficient wind power/sail design, and before you know it both the social and economic costs to large-scale violence is enormous and people “decide” that they “value” peaceful coexistence.
i’m not saying today’s global trade system is a result of AI… but there are so many points of leverage here that if it (or some future system like it) were, would we know?
if we wanted to avoid this type of value modification, we would need to commit to a value system that never changes. write these down on clay tablets that could be preserved in museums in their original form, keep the language of these historic texts alive via rituals and tradition, and encourage people to have faith in the ideas proposed by these ancients. you could make a religion out of this. and its strongest meta-value would necessarily be one of extreme conservatism, a resistance to change.