If you liked this post, you will love Amartya Sen’s Collective Choice and Social Welfare. Originally written in 1970 and expanded in 2017, this is a thorough development of the many paradoxes in collective choice algorithms (voting schemes, ways to aggregate individual utility, and so on.)
My sense is the AI alignment community has not taken these sorts of results seriously. Preference aggregation is non-trivial, so “aligning” an AI to individual preferences means something much different than “aligning” an AI to societal preferences. Different equally-principled ways of aggregating preferences will give different results, which means that someone somewhere will not get what they want. Hence an AI agent will always have some type of politics if only by virtue of its preference aggregation method, and we should be investigating which types we prefer.
Building AI that can reliably learn, predict, and respond to a human community’s normative structure is a distinct research program to building AI that can learn human preferences. Preferences are a formalization of a human’s subjective evaluation of two alternative actions or objects. The unit of analysis is the individual. The unit of analysis for predicting the content of a community’s normative structure is an aggregate: the equilibrium or otherwise durable patterns of behavior in a group. [...] The object of interest is what emerges from the interaction of multiple individuals, and will not be reducible to preferences. Indeed, to the extent that preferences merely capture the valuation an agent places on different courses of action with normative salience to a group, preferences are the outcome of the process of evaluating likely community responses and choosing actions on that basis, not a primitive of choice.
How do we make a choice about the “right” politics/preference aggregation method for an AI? I don’t think there is or can be an a-priori answer here, so we need something else to break the tie. One strategy is to ask what the consequences of that each type of political system will be in the actual world, rather than an abstract behind-the-veil scenario. But more fundamentally I don’t know that we can do better than what humans have always done, which is group discussions with the intention of coming to a workable agreement. Perhaps an AI agent can and should participate in such discussions. It’s not just the formal process that makes voting systems work, but the perceived legitimacy and therefore good-faith participation of the people who will be governed by it, and this is what such discussion creates.
Well, the “ideal” way to aggregate beliefs is by Aumann agreement, and the “ideal” way to aggregate values is by linear combination of utility functions. Neither involve voting. So I’m not sure voting theory will play much of a role. It’s more intended for situations where everyone behaves strategically; a superintelligent AI with visibility into our natures should be able to skip most of it.
I have to say I don’t know why a linear combination of utility functions could be considered ideal. There are some pretty classic arguments against it, such as Rawls’ maximin principle, and more consequentialist arguments against allowing inequality in practice.
Aumann agreement isn’t an answer here, unless you assume strong Bayesianism, which I would advise against.
To expand the argument a bit: if many people have evidence-based beliefs about something, you could combine these beliefs by voting, but why bother? You have a superintelligent AI! You can peek into everyone’s heads, gather all the evidence, remove double-counting, and perform a joint update. That’s basically what Aumann agreement does—it doesn’t vote on beliefs, but instead tries to reach an end state that’s updated on all the evidence behind these beliefs. I think methods along these lines (combining evidence instead of beliefs) are more correct and should be used whenever we can afford them.
For more details on this, see the old post Share likelihood ratios, not posterior beliefs. Wei Dai and Hal Finney discuss a nice toy example in the comments: two people observe a private coinflip each, how do they combine their beliefs about the proposition that both coins came up heads? Combining the evidence is simple and gives the right answer, while other clever schemes give wrong answers.
I have to say I don’t know why a linear combination of utility functions could be considered ideal.
Imagine that after doing the joint update, the agents agree to cooperate instead of fighting, and have a set of possible joint policies. Each joint policy leads to a tuple of expected utilities for all agents. The resulting set of points in N-dimensional space has a Pareto frontier. Each point on that Pareto frontier has a tangent hyperplane. So there’s some linear combination of utility functions that’s maximized at that point, modulo some tie-breaking if the frontier is perfectly flat there.
You can peek into everyone’s heads, gather all the evidence, remove double-counting, and perform a joint update. That’s basically what Aumann agreement does—it doesn’t vote on beliefs, but instead tries to reach an end state that’s updated on all the evidence behind these beliefs.
Right, this is where strong Bayesianism is required. You have to assume, for example, that everyone agrees on the set of hypotheses under consideration and the exact models to be used. This is not just an abstract plan for slicing the universe into manageable events, but the actual structure and properties of the measurement instruments that generate “evidence.” If we wish to act as well we also have to specify the set of possible interventions and their expected outcomes. These choices are well outside the scope of a Bayesian update (see e.g. Gelman and Shalizi or John Norton).
Also, I do not have a super-intelligent AI. I’m working on narrow AI alignment, and many of these systems have social choice problems too, for example recommender systems.
Imagine that after doing the joint update, the agents agree to cooperate instead of fighting, and have a set of possible joint policies. Each joint policy leads to a tuple of expected utilities for all agents. The resulting set of points in N-dimensional space has a Pareto frontier.
The Pareto frontier is a very weak constraint, and lots of points on it are bad. For a self-driving car that wants to drive both quickly and safely, both not moving at all and driving as fast as possible are on the frontier. For a distribution of wealth problem, “one person gets everything” is on the frontier. The hard problem is choosing between points on the frontier, that is, trading off one person’s utility against another. There is a long tradition of work within political economy which considers this problem in detail. It is, of course, partly a normative question, which is why norm-generation processes like voting are relevant.
Right, this is where strong Bayesianism is required. You have to assume, for example, that everyone agrees on the set of hypotheses under consideration and the exact models to be used.
But under these assumptions, combining evidence always gives the right answer. Compare with the example in the post: “vote on a, vote on b, vote on a^b” which just seems strange. Shouldn’t we try to use methods that give right answers to simple questions?
The hard problem is choosing between points on the frontier… which is why norm-generation processes like voting are relevant.
I think if you have a set of coefficients for comparing different people’s utilities (maybe derived by looking into their brains and measuring how much fun they feel), then that linear combination of utilities is almost tautologically the right solution. But if your only inputs are each person’s choices in some mechanism like voting, then each person’s utility function is only determined up to affine transform, and that’s not enough information to solve the problem.
For example, imagine two agents with utility functions A and B such that A<0, B<0, AB=1. So the Pareto frontier is one branch of a hyperbola. But if the agents instead had utility functions A’=2A and B’=B/2, the frontier would be the same hyperbola. Basically there’s no affine-invariant way to pick a point on that curve.
You could say that’s because the example uses unbounded utility functions. But they are unbounded only in the negative direction, which maybe isn’t so unrealistic. And anyway, the example suggests that even for bounded utility functions, any method would have to be sensitive to the far negative reaches of utility, which seems strange. Compare to what happens when you do have coefficients for comparing utilities, then the method is nicely local.
But under these assumptions, combining evidence always gives the right answer. Compare with the example in the post: “vote on a, vote on b, vote on a^b” which just seems strange. Shouldn’t we try to use methods that give right answers to simple questions?
a) “Everyone does Bayesian updating according to the same hypothesis set, model, and measurement methods” strikes me as an extremelystrong assumption, especially since we do not have strong theory that tells us the “right” way to select these hypothesis sets, models, and measurement instruments. I would argue that this makes Aumann agreement essentially useless in “open world” scenarios.
b) Why should uniquely consistent aggregation methods exist at all? A long line of folks including Condorcet, Arrow, Sen and Parfit have pointed out that when you start aggregating beliefs, utility, or preferences, there do not exist methods that always give unambiguously “correct” answers.
I think if you have a set of coefficients for comparing different people’s utilities (maybe derived by looking into their brains and measuring how much fun they feel), then that linear combination of utilities is almost tautologically the right solution.
Sure, but finding the set of coefficients for comparing different people’s utilities is a hard problem in AI alignment, or political economy generally. Not only are there tremendous normative uncertainties here (“how much inequality is too much?”) but the problem of combining utilities a minefield of paradoxes even if you are just summing or averaging.
I kinda lost track of what made me thin about it but one of the conditions for Aumann is that you think the other actor is rational. Then it seemed that it might be rational to not agree if you well-foundedly think the other party is irrational. I think the idea was that if both parties rely on public information and they compete to accomplish something based on that then the difference must come from how they process that information. If they were to process the information in the same way they would need to come to the same conclusion.
So in a way assuming Aumann agreement might secretly assume that everybody “deep down” has the same base policy which might be better warranted if one is looking at information access differences but for genuine opinion differences it becomes much more doubtful.
If you liked this post, you will love Amartya Sen’s Collective Choice and Social Welfare. Originally written in 1970 and expanded in 2017, this is a thorough development of the many paradoxes in collective choice algorithms (voting schemes, ways to aggregate individual utility, and so on.)
My sense is the AI alignment community has not taken these sorts of results seriously. Preference aggregation is non-trivial, so “aligning” an AI to individual preferences means something much different than “aligning” an AI to societal preferences. Different equally-principled ways of aggregating preferences will give different results, which means that someone somewhere will not get what they want. Hence an AI agent will always have some type of politics if only by virtue of its preference aggregation method, and we should be investigating which types we prefer.
I thought Incomplete Contracting and AI Alignment addressed this situation nicely:
How do we make a choice about the “right” politics/preference aggregation method for an AI? I don’t think there is or can be an a-priori answer here, so we need something else to break the tie. One strategy is to ask what the consequences of that each type of political system will be in the actual world, rather than an abstract behind-the-veil scenario. But more fundamentally I don’t know that we can do better than what humans have always done, which is group discussions with the intention of coming to a workable agreement. Perhaps an AI agent can and should participate in such discussions. It’s not just the formal process that makes voting systems work, but the perceived legitimacy and therefore good-faith participation of the people who will be governed by it, and this is what such discussion creates.
Well, the “ideal” way to aggregate beliefs is by Aumann agreement, and the “ideal” way to aggregate values is by linear combination of utility functions. Neither involve voting. So I’m not sure voting theory will play much of a role. It’s more intended for situations where everyone behaves strategically; a superintelligent AI with visibility into our natures should be able to skip most of it.
This is not obvious to me. Can you elaborate?
Aumann agreement isn’t an answer here, unless you assume strong Bayesianism, which I would advise against.
I have to say I don’t know why a linear combination of utility functions could be considered ideal. There are some pretty classic arguments against it, such as Rawls’ maximin principle, and more consequentialist arguments against allowing inequality in practice.
To expand the argument a bit: if many people have evidence-based beliefs about something, you could combine these beliefs by voting, but why bother? You have a superintelligent AI! You can peek into everyone’s heads, gather all the evidence, remove double-counting, and perform a joint update. That’s basically what Aumann agreement does—it doesn’t vote on beliefs, but instead tries to reach an end state that’s updated on all the evidence behind these beliefs. I think methods along these lines (combining evidence instead of beliefs) are more correct and should be used whenever we can afford them.
For more details on this, see the old post Share likelihood ratios, not posterior beliefs. Wei Dai and Hal Finney discuss a nice toy example in the comments: two people observe a private coinflip each, how do they combine their beliefs about the proposition that both coins came up heads? Combining the evidence is simple and gives the right answer, while other clever schemes give wrong answers.
Imagine that after doing the joint update, the agents agree to cooperate instead of fighting, and have a set of possible joint policies. Each joint policy leads to a tuple of expected utilities for all agents. The resulting set of points in N-dimensional space has a Pareto frontier. Each point on that Pareto frontier has a tangent hyperplane. So there’s some linear combination of utility functions that’s maximized at that point, modulo some tie-breaking if the frontier is perfectly flat there.
Right, this is where strong Bayesianism is required. You have to assume, for example, that everyone agrees on the set of hypotheses under consideration and the exact models to be used. This is not just an abstract plan for slicing the universe into manageable events, but the actual structure and properties of the measurement instruments that generate “evidence.” If we wish to act as well we also have to specify the set of possible interventions and their expected outcomes. These choices are well outside the scope of a Bayesian update (see e.g. Gelman and Shalizi or John Norton).
Also, I do not have a super-intelligent AI. I’m working on narrow AI alignment, and many of these systems have social choice problems too, for example recommender systems.
The Pareto frontier is a very weak constraint, and lots of points on it are bad. For a self-driving car that wants to drive both quickly and safely, both not moving at all and driving as fast as possible are on the frontier. For a distribution of wealth problem, “one person gets everything” is on the frontier. The hard problem is choosing between points on the frontier, that is, trading off one person’s utility against another. There is a long tradition of work within political economy which considers this problem in detail. It is, of course, partly a normative question, which is why norm-generation processes like voting are relevant.
But under these assumptions, combining evidence always gives the right answer. Compare with the example in the post: “vote on a, vote on b, vote on a^b” which just seems strange. Shouldn’t we try to use methods that give right answers to simple questions?
I think if you have a set of coefficients for comparing different people’s utilities (maybe derived by looking into their brains and measuring how much fun they feel), then that linear combination of utilities is almost tautologically the right solution. But if your only inputs are each person’s choices in some mechanism like voting, then each person’s utility function is only determined up to affine transform, and that’s not enough information to solve the problem.
For example, imagine two agents with utility functions A and B such that A<0, B<0, AB=1. So the Pareto frontier is one branch of a hyperbola. But if the agents instead had utility functions A’=2A and B’=B/2, the frontier would be the same hyperbola. Basically there’s no affine-invariant way to pick a point on that curve.
You could say that’s because the example uses unbounded utility functions. But they are unbounded only in the negative direction, which maybe isn’t so unrealistic. And anyway, the example suggests that even for bounded utility functions, any method would have to be sensitive to the far negative reaches of utility, which seems strange. Compare to what happens when you do have coefficients for comparing utilities, then the method is nicely local.
Does that make sense?
a) “Everyone does Bayesian updating according to the same hypothesis set, model, and measurement methods” strikes me as an extremely strong assumption, especially since we do not have strong theory that tells us the “right” way to select these hypothesis sets, models, and measurement instruments. I would argue that this makes Aumann agreement essentially useless in “open world” scenarios.
b) Why should uniquely consistent aggregation methods exist at all? A long line of folks including Condorcet, Arrow, Sen and Parfit have pointed out that when you start aggregating beliefs, utility, or preferences, there do not exist methods that always give unambiguously “correct” answers.
Sure, but finding the set of coefficients for comparing different people’s utilities is a hard problem in AI alignment, or political economy generally. Not only are there tremendous normative uncertainties here (“how much inequality is too much?”) but the problem of combining utilities a minefield of paradoxes even if you are just summing or averaging.
Yeah. I was more trying to argue that, compared to Bayesian ideas, voting doesn’t win you all that much.
I kinda lost track of what made me thin about it but one of the conditions for Aumann is that you think the other actor is rational. Then it seemed that it might be rational to not agree if you well-foundedly think the other party is irrational. I think the idea was that if both parties rely on public information and they compete to accomplish something based on that then the difference must come from how they process that information. If they were to process the information in the same way they would need to come to the same conclusion.
So in a way assuming Aumann agreement might secretly assume that everybody “deep down” has the same base policy which might be better warranted if one is looking at information access differences but for genuine opinion differences it becomes much more doubtful.