ARCHES distinguishes between single-agent / single-user and single-agent/multi-user alignment scenarios. Given assumptions like “everyone in society is VNM-rational” and “societal preferences should also follow VNM rationality”, and “if everyone wants a thing, society also wants the thing”, Harsanyi’s utilitarian theorem shows that the societal utility function is a linear non-negative weightedcombination of everyone’s utilities. So, in a very narrow (and unrealistic) setting, Harsanyi’s theorem tells you how the single-multi solution is built from the single-single solutions.
This obviously doesn’t actually solve either alignment problem. But, it seems like an interesting parallel for what we might eventually want.
ARCHES distinguishes between single-agent / single-user and single-agent/multi-user alignment scenarios. Given assumptions like “everyone in society is VNM-rational” and “societal preferences should also follow VNM rationality”, and “if everyone wants a thing, society also wants the thing”, Harsanyi’s utilitarian theorem shows that the societal utility function is a linear non-negative weighted combination of everyone’s utilities. So, in a very narrow (and unrealistic) setting, Harsanyi’s theorem tells you how the single-multi solution is built from the single-single solutions.
This obviously doesn’t actually solve either alignment problem. But, it seems like an interesting parallel for what we might eventually want.