I think this post is mostly about how to do the reflection, consistentising, and so on.
But at the risk of oversimplifying, let’s pretend for a moment we just have some utility functions.
Then you can for sure aggregate them into a mega utility function (at least in principle). This is very underspecified!! predominantly as a consequence of the question of how to weight individual utility functions in the aggregation. (Holden has a nice discussion of Harsanyi’s aggregation theorem which goes into some more discussion, but yes, we have not found it written in the universe how to weight the aggregation.)
There’s also an interesting relationship (almost 1-1 aside from edge-cases) between welfare optima (that is, optima of some choice of weighted aggregation of utilities as above) and Pareto optima[1] (that is, outcomes unimprovable for anyone without worsening for someone). I think this, together with Harsanyi, tells us that some sort of Pareto-ish target would be the result of ‘the most coherent’ possible extrapolation of humanity’s goals. But this still leaves wide open the coefficients/weighting of the aggregation, which in the Pareto formulation corresponds to the position on the Pareto frontier. BTW Drexler has an interesting discussion of cooperation and conflict on the Pareto frontier.
I have a paper+blogpost hopefully coming out soon which goes into some of this detail and discusses where that missing piece (the welfare weightings or ‘calibration’) come from (descriptively, mainly; we’re not very prescriptive unfortunately).
yes, it is not written in the universe how to weight the aggregation
I think it’s written, but not in behavior.
Imagine two people whose behavior is encoded by the same utility function—they both behave as if they valued chocolate as 1 and vanilla as 2. But internally, the first person feels very strongly about all of their preferences, while the second one is very even-tempered and mostly feels ok no matter what. (They’d climb the same height of stairs to get vanilla, too, because the second person is more indifferent about vanilla but also is less bothered by climbing stairs.) Then we want to give them different weight in the aggregation, even though they have the same utility function. That means the correct weighting should be inferred from internal feelings, not only from behavior.
Another, more drastic thought experiment: imagine a box that has no behavior at all, but in fact there’s a person inside. You have to decide whether to send resources into the box. For that you need to know what’s in the box and what feelings it contains.
but your reply obviously beat me to it! I agree, there is plausibly some ‘actual valence magnitude’ which we ‘should’ normatively account for in aggregations.
In behavioural practice, it comes down to what cooperative/normative infrastructure is giving rise to the cooperative gains which push toward the Pareto frontier. e.g.
explicit instructions/norms (fair or otherwise)
‘exchange rates’ between goods or directly on utilities
marginal production returns on given resources
starting state/allocation in dynamic economy-like scenarios (with trades)
differential bargaining power/leverage
In discussion I have sometimes used the ‘ice cream/stabbing game’ as an example
either you get ice cream and I get stabbed
or neither of those things
neither of us is concerned with the other’s preferences
It’s basically a really extreme version of your chocolate and vanilla case. But they’re preference-isomorphic!
I think this post is mostly about how to do the reflection, consistentising, and so on.
But at the risk of oversimplifying, let’s pretend for a moment we just have some utility functions.
Then you can for sure aggregate them into a mega utility function (at least in principle). This is very underspecified!! predominantly as a consequence of the question of how to weight individual utility functions in the aggregation. (Holden has a nice discussion of Harsanyi’s aggregation theorem which goes into some more discussion, but yes, we have not found it written in the universe how to weight the aggregation.)
There’s also an interesting relationship (almost 1-1 aside from edge-cases) between welfare optima (that is, optima of some choice of weighted aggregation of utilities as above) and Pareto optima[1] (that is, outcomes unimprovable for anyone without worsening for someone). I think this, together with Harsanyi, tells us that some sort of Pareto-ish target would be the result of ‘the most coherent’ possible extrapolation of humanity’s goals. But this still leaves wide open the coefficients/weighting of the aggregation, which in the Pareto formulation corresponds to the position on the Pareto frontier. BTW Drexler has an interesting discussion of cooperation and conflict on the Pareto frontier.
I have a paper+blogpost hopefully coming out soon which goes into some of this detail and discusses where that missing piece (the welfare weightings or ‘calibration’) come from (descriptively, mainly; we’re not very prescriptive unfortunately).
This connection goes back as far as I know to the now eponymous ABB theorem of Arrow, Barankin and Blackwell in 1953, and there’s a small lineage of followup research exploring the connection
I think it’s written, but not in behavior.
Imagine two people whose behavior is encoded by the same utility function—they both behave as if they valued chocolate as 1 and vanilla as 2. But internally, the first person feels very strongly about all of their preferences, while the second one is very even-tempered and mostly feels ok no matter what. (They’d climb the same height of stairs to get vanilla, too, because the second person is more indifferent about vanilla but also is less bothered by climbing stairs.) Then we want to give them different weight in the aggregation, even though they have the same utility function. That means the correct weighting should be inferred from internal feelings, not only from behavior.
Another, more drastic thought experiment: imagine a box that has no behavior at all, but in fact there’s a person inside. You have to decide whether to send resources into the box. For that you need to know what’s in the box and what feelings it contains.
I swiftly edited that to read
but your reply obviously beat me to it! I agree, there is plausibly some ‘actual valence magnitude’ which we ‘should’ normatively account for in aggregations.
In behavioural practice, it comes down to what cooperative/normative infrastructure is giving rise to the cooperative gains which push toward the Pareto frontier. e.g.
explicit instructions/norms (fair or otherwise)
‘exchange rates’ between goods or directly on utilities
marginal production returns on given resources
starting state/allocation in dynamic economy-like scenarios (with trades)
differential bargaining power/leverage
In discussion I have sometimes used the ‘ice cream/stabbing game’ as an example
either you get ice cream and I get stabbed
or neither of those things
neither of us is concerned with the other’s preferences
It’s basically a really extreme version of your chocolate and vanilla case. But they’re preference-isomorphic!