It seems to me like max-actual would be better than max-min if it could be made to work.
That is, find a distribution over policies + weighting of utility functions such that (a) the distribution is optimal according to the weighting, (b) each utility function is weighted so that the difference between their preferred policy and the actual policy is 1. I think this exists by a simple fixed point argument. I’m not sure if it’s unique.
Short of that, if using mean or variance, it seems much better to use the probability distribution “Pick the preferred policy of a random theory” rather than picking a uniformly random policy.
That is, find a distribution over policies + weighting of utility functions such that (a) the distribution is optimal according to the weighting, (b) each utility function is weighted so that the difference between their preferred policy and the actual policy is 1. I think this exists by a simple fixed point argument. I’m not sure if it’s unique.
I don’t understand (a), but (b) has problems when there are policies that are actually ideal for all/most utilities—you don’t want to rule out generally optimal policies if they exist.
That’s pretty much the “Mutual worth bargaining solution”
I don’t see how it can be the same as the mutual worth bargaining solution. That bargaining solution assumes we were given a default solution, and this proposal doesn’t (but see above, this solution doesn’t make sense).
ETA: this is based on a sign error, as was the original intuition. Everywhere below I wrote as if getting a higher utility causes your weight to decrease, but it actually causes your weight to increase. So you could this with (actual-min), or (actual-default) as in Nash, but that’s not as appealing.
Existence proof (not totally sure its right):
Given a policy distribution pi, say a new policy is “admissible” if it optimizes the weights 1/(max utility—realized utility under pi).
That map is Kakutani, so there is some policy which is in its own admissible set, as desired.
Proof that it’s unique:
Consider two weights w, w’, produced by this procedure, with corresponding profiles of utilities u, u’.
We know that every term of (u-u’)(w-w’) is non-positive, since w decreases whenever u increases.
But we can expand the sum as the sum of uw + u’w’ - uw’ - u’w ⇐ 0
Since u and u’ were the utilities of the maximizing profiles, we have uw >= u’w, and uw’ >= uw’.
Thus the sum of (u-u’)(w-w’) = 0, so every term is 0, so we have u=u’ and w=w’ (if one pair is equal the other must be as well, by construction).
The policy was constructed as optimizing a weighted sum of utilities, so it’s Pareto efficient, but the uniqueness argument and intuition for reasonableness was based on a sign error.
It seems to me like max-actual would be better than max-min if it could be made to work.
That is, find a distribution over policies + weighting of utility functions such that (a) the distribution is optimal according to the weighting, (b) each utility function is weighted so that the difference between their preferred policy and the actual policy is 1. I think this exists by a simple fixed point argument. I’m not sure if it’s unique.
Short of that, if using mean or variance, it seems much better to use the probability distribution “Pick the preferred policy of a random theory” rather than picking a uniformly random policy.
That’s pretty much the “Mutual worth bargaining solution” https://www.lesswrong.com/posts/7kvBxG9ZmYb5rDRiq/gains-from-trade-slug-versus-galaxy-how-much-would-i-give-up
I don’t understand (a), but (b) has problems when there are policies that are actually ideal for all/most utilities—you don’t want to rule out generally optimal policies if they exist.
I don’t see how it can be the same as the mutual worth bargaining solution. That bargaining solution assumes we were given a default solution, and this proposal doesn’t (but see above, this solution doesn’t make sense).
I misunderstood your proposal.
ETA: this is based on a sign error, as was the original intuition. Everywhere below I wrote as if getting a higher utility causes your weight to decrease, but it actually causes your weight to increase. So you could this with (actual-min), or (actual-default) as in Nash, but that’s not as appealing.
Existence proof (not totally sure its right):
Given a policy distribution pi, say a new policy is “admissible” if it optimizes the weights 1/(max utility—realized utility under pi).
That map is Kakutani, so there is some policy which is in its own admissible set, as desired.
Proof that it’s unique:
Consider two weights w, w’, produced by this procedure, with corresponding profiles of utilities u, u’.
We know that every term of (u-u’)(w-w’) is non-positive, since w decreases whenever u increases.
But we can expand the sum as the sum of uw + u’w’ - uw’ - u’w ⇐ 0
Since u and u’ were the utilities of the maximizing profiles, we have uw >= u’w, and uw’ >= uw’.
Thus the sum of (u-u’)(w-w’) = 0, so every term is 0, so we have u=u’ and w=w’ (if one pair is equal the other must be as well, by construction).
That policy, if it exists, need not be Pareto.
The policy was constructed as optimizing a weighted sum of utilities, so it’s Pareto efficient, but the uniqueness argument and intuition for reasonableness was based on a sign error.