This post is an example of how one could go about sythesising contradictory human preferences and meta-preferences, using a somewhat simplified version of the method of this post.
The aim is to illustrate how I imagine the process going. I’ll take the example of population ethics, since it’s an area with divergent intuition and arguments.
Over-complicated values?
One key point of this synthesis is that I am very reluctant to throw a preference away completely, preferring to keep in around in a very weakened form. The reasoning behind this decision is 1) A weak preference will make little practical difference for most decisions, and 2) Most AI catastrophes involve over-simple utility functions, so simplicity itself is something to be wary of, and there is no clear boundary between “good simplicity” and “bad simplicity”.
So we might end up losing most of human value if we go too simple, and be a little bit inefficient if we don’t go simple enough.
Basic pieces
Human H has not thought hard about population ethics at all. In term of theory, they have a vague preference for total utilitarianism and they have some egalitarian preferences. They also have a lot of “good/bad” examples drawn from some knowledge of history and some fictional experiences (ie book, movies, and TV shows).
Note that it is wrong to use fiction as evidence of how the world works. However, using fiction as source of values is not intrinsically wrong.
At the meta-level, they have one relevant preference: they have a preference for simple models.
Weights of the preference
H has these preferences with different strengths, denoted by (un-normalised) weights to these preferences. Their weight on total utilitarianism is 4, their weight on egalitarianism is 2, their weight on their examples is 5, and their weight on model simplicity is 3.
They accept the logic of the mere addition argument, but feel emotionally against the two repugnant conclusions; most of the examples they can bring to mind are against it.
Synthesis
There are multiple ways of doing the synthesis; here is one I believe is acceptable. The model simplicity weight is 3; however, the mere addition argument only works under total-utilitarian preferences, it does not work under egalitarian preferences. The ratio of total to egalitarian is 4:2=2:1, so the weight given to total utilitarian-style arguments by the mere addition argument is 2.
(Notice here that egalitarianism has no problems with the repugnant conclusion at all—however, it doesn’t believe in the mere addition argument, so this provides no extra weight)
If we tried to do a best fit of utility theory to all of H’s mental examples, we’d end up with a utility u that is mainly some form of prioritarianism with some extra complications; write this as uP+uσ, for uσ a utility function whose changes in magnitude are small compared with uP.
Let uT and uE be total and egalitarian population ethics utilities. Putting this together, we could get an overall utility function:
uH=(3+2)uT+2uE+5(uP+uσ).
The 3 on uT is its initial valuation, the 2 is the extra component it picked up from the mere addition argument.
In practice
So what does maximising uH look like in practice? Well, because the weights of the three main utilities are comparable, they will push towards worlds that align with all three—high total utility, high equality. Mild increases in inequality are acceptable for large increases in total utility or utility of the worst of population, and the other tradeoffs are similar.
What is mainly ruled out are worlds where one factor is maximised strongly, but the others are ruthlessly minimised.
Issues
The above synthesis raises a number of issues (which is why I was keen to write it down).
Utility normalisation
First of all, there’s the question of normalising the various utility functions. A slight preference for a utility function u can swamp all other preferences, if the magnitude changes in u across different choices are huge. I tend to assume a min-max normalisation for any utility u, so that
E(u∣π∗u)=1E(u∣π∗−u)=0,
with π∗u the optimal policy for u, and π∗−u the optimal policy for −u (hence the worst policy for u). This min-max normalisation doesn’t have particularly nice properties, but then again, neither do any other normalisations.
Factual errors and consequences
I said that fictional evidence and historical examples are fine for constructing preferences. But what if part of this evidence is based on factually wrong suppositions? For example, maybe one of strong examples in favour of egalitarianism is H imagining themselves in very poor situations. But maybe people who are genuinely poor don’t suffer as much as H imagines they would. Or, converely, maybe people do suffer from lack of equality, more than H might think.
Things like this seem as if its just a division between factual and preferences beliefs, but it’s not so easy to disentangle. A religious fundamentalist might have a picture of heaven in which everyone would actually be miserable. To convince them of this fact, it is not sufficient to point that out; the process of causing them to believe it may break many other aspects of their preferences and world-view as well. More importantly, learning some true facts will likely cause people to change the strength (the weight) with which they hold certain preferences.
This does not seem insoluble, but it is a challenge to be aware of.
Order of operations and underdefined
Many people construct explicit preferences by comparing their consequences with mental examples, then rejecting or accepting the preferences based on this fit. This process is very vulnerable to which examples spring to mind at the time. I showed a process that extrapolated current preferences by imagining H encountering the mere addition argument, and the very repugnant conclusion. But I chose those examples because they are salient to me and to a lot of philosophers in that area, so the choice is somewhat arbitrary. When H’s preferences are taken as fixed—before or after which hypothetical arguments—will be important for the final result.
Similarly, I used uT and uE separately to consider H’s reactions to the mere addition argument. I could instead have used 3uT+2uE to do so, or even 3uT+2uE+5(uP+σ). For these utilities, the mere addition argument doesn’t go through at all, so does not give any extra weight to uT. Why did I do it that way? Because I judged that the mere addition argument sounds persuasive, even to people who should actually reject it based on some synthesis of their current preferences.
So there remains a lot of details to fill in and choices to make.
Meta-meta-preferences
And, of course, H might have higher order preferences about utility normalisation, factual errors, orders of operation, and so on. These provide an extra layer of possible complications to add on.
Synthesising divergent preferences: an example in population ethics
This post is an example of how one could go about sythesising contradictory human preferences and meta-preferences, using a somewhat simplified version of the method of this post.
The aim is to illustrate how I imagine the process going. I’ll take the example of population ethics, since it’s an area with divergent intuition and arguments.
Over-complicated values?
One key point of this synthesis is that I am very reluctant to throw a preference away completely, preferring to keep in around in a very weakened form. The reasoning behind this decision is 1) A weak preference will make little practical difference for most decisions, and 2) Most AI catastrophes involve over-simple utility functions, so simplicity itself is something to be wary of, and there is no clear boundary between “good simplicity” and “bad simplicity”.
So we might end up losing most of human value if we go too simple, and be a little bit inefficient if we don’t go simple enough.
Basic pieces
Human H has not thought hard about population ethics at all. In term of theory, they have a vague preference for total utilitarianism and they have some egalitarian preferences. They also have a lot of “good/bad” examples drawn from some knowledge of history and some fictional experiences (ie book, movies, and TV shows).
Note that it is wrong to use fiction as evidence of how the world works. However, using fiction as source of values is not intrinsically wrong.
At the meta-level, they have one relevant preference: they have a preference for simple models.
Weights of the preference
H has these preferences with different strengths, denoted by (un-normalised) weights to these preferences. Their weight on total utilitarianism is 4, their weight on egalitarianism is 2, their weight on their examples is 5, and their weight on model simplicity is 3.
Future arguments
The AI checks how the human would respond if presented with the mere addition argument, the repugnant conclusion, and the very repugnant conclusion.
They accept the logic of the mere addition argument, but feel emotionally against the two repugnant conclusions; most of the examples they can bring to mind are against it.
Synthesis
There are multiple ways of doing the synthesis; here is one I believe is acceptable. The model simplicity weight is 3; however, the mere addition argument only works under total-utilitarian preferences, it does not work under egalitarian preferences. The ratio of total to egalitarian is 4:2=2:1, so the weight given to total utilitarian-style arguments by the mere addition argument is 2.
(Notice here that egalitarianism has no problems with the repugnant conclusion at all—however, it doesn’t believe in the mere addition argument, so this provides no extra weight)
If we tried to do a best fit of utility theory to all of H’s mental examples, we’d end up with a utility u that is mainly some form of prioritarianism with some extra complications; write this as uP+uσ, for uσ a utility function whose changes in magnitude are small compared with uP.
Let uT and uE be total and egalitarian population ethics utilities. Putting this together, we could get an overall utility function:
uH=(3+2)uT+2uE+5(uP+uσ).
The 3 on uT is its initial valuation, the 2 is the extra component it picked up from the mere addition argument.
In practice
So what does maximising uH look like in practice? Well, because the weights of the three main utilities are comparable, they will push towards worlds that align with all three—high total utility, high equality. Mild increases in inequality are acceptable for large increases in total utility or utility of the worst of population, and the other tradeoffs are similar.
What is mainly ruled out are worlds where one factor is maximised strongly, but the others are ruthlessly minimised.
Issues
The above synthesis raises a number of issues (which is why I was keen to write it down).
Utility normalisation
First of all, there’s the question of normalising the various utility functions. A slight preference for a utility function u can swamp all other preferences, if the magnitude changes in u across different choices are huge. I tend to assume a min-max normalisation for any utility u, so that
E(u∣π∗u)=1E(u∣π∗−u)=0,
with π∗u the optimal policy for u, and π∗−u the optimal policy for −u (hence the worst policy for u). This min-max normalisation doesn’t have particularly nice properties, but then again, neither do any other normalisations.
Factual errors and consequences
I said that fictional evidence and historical examples are fine for constructing preferences. But what if part of this evidence is based on factually wrong suppositions? For example, maybe one of strong examples in favour of egalitarianism is H imagining themselves in very poor situations. But maybe people who are genuinely poor don’t suffer as much as H imagines they would. Or, converely, maybe people do suffer from lack of equality, more than H might think.
Things like this seem as if its just a division between factual and preferences beliefs, but it’s not so easy to disentangle. A religious fundamentalist might have a picture of heaven in which everyone would actually be miserable. To convince them of this fact, it is not sufficient to point that out; the process of causing them to believe it may break many other aspects of their preferences and world-view as well. More importantly, learning some true facts will likely cause people to change the strength (the weight) with which they hold certain preferences.
This does not seem insoluble, but it is a challenge to be aware of.
Order of operations and underdefined
Many people construct explicit preferences by comparing their consequences with mental examples, then rejecting or accepting the preferences based on this fit. This process is very vulnerable to which examples spring to mind at the time. I showed a process that extrapolated current preferences by imagining H encountering the mere addition argument, and the very repugnant conclusion. But I chose those examples because they are salient to me and to a lot of philosophers in that area, so the choice is somewhat arbitrary. When H’s preferences are taken as fixed—before or after which hypothetical arguments—will be important for the final result.
Similarly, I used uT and uE separately to consider H’s reactions to the mere addition argument. I could instead have used 3uT+2uE to do so, or even 3uT+2uE+5(uP+σ). For these utilities, the mere addition argument doesn’t go through at all, so does not give any extra weight to uT. Why did I do it that way? Because I judged that the mere addition argument sounds persuasive, even to people who should actually reject it based on some synthesis of their current preferences.
So there remains a lot of details to fill in and choices to make.
Meta-meta-preferences
And, of course, H might have higher order preferences about utility normalisation, factual errors, orders of operation, and so on. These provide an extra layer of possible complications to add on.