paulfchristiano comments on Intertheoretic utility comparison

paulfchristiano 3 Jul 2018 16:27 UTC
2 points
ETA: this is based on a sign error, as was the original intuition. Everywhere below I wrote as if getting a higher utility causes your weight to decrease, but it actually causes your weight to increase. So you could this with (actual-min), or (actual-default) as in Nash, but that’s not as appealing.
Existence proof (not totally sure its right):
- Given a policy distribution pi, say a new policy is “admissible” if it optimizes the weights 1/(max utility—realized utility under pi).
- That map is Kakutani, so there is some policy which is in its own admissible set, as desired.
Proof that it’s unique:
- Consider two weights w, w’, produced by this procedure, with corresponding profiles of utilities u, u’.
- We know that every term of (u-u’)(w-w’) is non-positive, since w decreases whenever u increases.
- But we can expand the sum as the sum of uw + u’w’ - uw’ - u’w ⇐ 0
- Since u and u’ were the utilities of the maximizing profiles, we have uw >= u’w, and uw’ >= uw’.
- Thus the sum of (u-u’)(w-w’) = 0, so every term is 0, so we have u=u’ and w=w’ (if one pair is equal the other must be as well, by construction).
- Stuart_Armstrong 3 Jul 2018 16:35 UTC
  2 points
  Parent
  That policy, if it exists, need not be Pareto.
  - paulfchristiano 3 Jul 2018 16:50 UTC
    4 points
    Parent
    The policy was constructed as optimizing a weighted sum of utilities, so it’s Pareto efficient, but the uniqueness argument and intuition for reasonableness was based on a sign error.