ETA: this is based on a sign error, as was the original intuition. Everywhere below I wrote as if getting a higher utility causes your weight to decrease, but it actually causes your weight to increase. So you could this with (actual-min), or (actual-default) as in Nash, but that’s not as appealing.
Existence proof (not totally sure its right):
Given a policy distribution pi, say a new policy is “admissible” if it optimizes the weights 1/(max utility—realized utility under pi).
That map is Kakutani, so there is some policy which is in its own admissible set, as desired.
Proof that it’s unique:
Consider two weights w, w’, produced by this procedure, with corresponding profiles of utilities u, u’.
We know that every term of (u-u’)(w-w’) is non-positive, since w decreases whenever u increases.
But we can expand the sum as the sum of uw + u’w’ - uw’ - u’w ⇐ 0
Since u and u’ were the utilities of the maximizing profiles, we have uw >= u’w, and uw’ >= uw’.
Thus the sum of (u-u’)(w-w’) = 0, so every term is 0, so we have u=u’ and w=w’ (if one pair is equal the other must be as well, by construction).
The policy was constructed as optimizing a weighted sum of utilities, so it’s Pareto efficient, but the uniqueness argument and intuition for reasonableness was based on a sign error.
ETA: this is based on a sign error, as was the original intuition. Everywhere below I wrote as if getting a higher utility causes your weight to decrease, but it actually causes your weight to increase. So you could this with (actual-min), or (actual-default) as in Nash, but that’s not as appealing.
Existence proof (not totally sure its right):
Given a policy distribution pi, say a new policy is “admissible” if it optimizes the weights 1/(max utility—realized utility under pi).
That map is Kakutani, so there is some policy which is in its own admissible set, as desired.
Proof that it’s unique:
Consider two weights w, w’, produced by this procedure, with corresponding profiles of utilities u, u’.
We know that every term of (u-u’)(w-w’) is non-positive, since w decreases whenever u increases.
But we can expand the sum as the sum of uw + u’w’ - uw’ - u’w ⇐ 0
Since u and u’ were the utilities of the maximizing profiles, we have uw >= u’w, and uw’ >= uw’.
Thus the sum of (u-u’)(w-w’) = 0, so every term is 0, so we have u=u’ and w=w’ (if one pair is equal the other must be as well, by construction).
That policy, if it exists, need not be Pareto.
The policy was constructed as optimizing a weighted sum of utilities, so it’s Pareto efficient, but the uniqueness argument and intuition for reasonableness was based on a sign error.