I’ve long liked the mean-max normalisation; in this view, what matters is the difference between a utility’s optimal policy, and a random policy. So, in a sense, each utility function has a equal shot of moving the outcome away from an expected random policy, and towards themselves.
So utility normalization is about making a compromise. (I’m visualizing a frontier of some sort*.)
This π_rd is an excellent candidate for replacing the random policy in the normalisation. It is well defined, it would never choose options that all utilities object to, and it doesn’t care about how options are labelled or about how to count them.
How related is this to the literature on voting? (There I understand there are some issues, including: (under some circumstances) if the random dictator policy is used there is zero probability of an option being chosen which is all the second choice of all parties.)
where Eπ∗i[Ui] is the expected utility of Ui given optimal policy, and Eπrd[Ui] is its expected utility given the random dictator policy.
That was difficult to understand. (In part because of the self reference.**)
It is also invariant under cloning (ie adding another option that is completely equivalent to one of the options already there), which the mean-max normalisation does not.
But it isn’t invariant to adding another utility function which is identical to one already present.
There are some obvious ways to fix this (maybe use √pi rather than pi), but they all have problems with continuity, either when pi→0, or when Ui→Uj.
I didn’t entirely follow this. (Would replacing U_i with ln(U_i) help?)
*Like the one mentioned in the post about a multi-round prisoner’s dilemma, where one player says they value “utility” while the other says they value “difference in utility”, and the solution to the problem was described (abstractly) based on the frontier.
** I guess I’ll have to come up with a toy problem involving some options and utilities to figure this out.
there is zero probability of an option being chosen which is all the second choice of all parties
We might get around this by letting each agent submit not only a utility, but also the probability distribution over actions it would choose if it were dictator. If he’s a maximizer, this doesn’t get around that. If he’s a quantilizer, this should. A desirable property would be that an agent wants to not lie about this.
Er, this normalisation system way well solve that problem entirely. If Ui prefers option oi (utility 1), with second choice o0 (utility 1/2), and all the other options as third choice (utility 0), then the expected utility of the random dictator is 1/n for all Ui (as p∗i gives utility 1, and p∗j gives utility 0 for all j≠i), so the normalised weighted utility to maximise is:
U=1n−1(U1+U2+…Un).
Using (n−1)U (because scaling doesn’t change expected utility decisions), the utility of any oi, i>0, is 1, while the utility of o0 is n/2. So if n>2, the compromise option o0 will get chosen.
Don’t confuse the problems of the random dictator, with the problems of maximising the weighted sum of the normalisations that used the random dictator (and don’t confuse the other way, either; the random dictator is immune to players’ lying, this normalisation is not).
I was aware, but addressing his objection as though it were justified, which it would be if this were the only place where the agent’s preferences matter. This counterfactual is supported by my fondness for linear logic.
So utility normalization is about making a compromise. (I’m visualizing a frontier of some sort*.)
How related is this to the literature on voting? (There I understand there are some issues, including: (under some circumstances) if the random dictator policy is used there is zero probability of an option being chosen which is all the second choice of all parties.)
That was difficult to understand. (In part because of the self reference.**)
But it isn’t invariant to adding another utility function which is identical to one already present.
I didn’t entirely follow this. (Would replacing U_i with ln(U_i) help?)
*Like the one mentioned in the post about a multi-round prisoner’s dilemma, where one player says they value “utility” while the other says they value “difference in utility”, and the solution to the problem was described (abstractly) based on the frontier.
** I guess I’ll have to come up with a toy problem involving some options and utilities to figure this out.
We might get around this by letting each agent submit not only a utility, but also the probability distribution over actions it would choose if it were dictator. If he’s a maximizer, this doesn’t get around that. If he’s a quantilizer, this should. A desirable property would be that an agent wants to not lie about this.
Er, this normalisation system way well solve that problem entirely. If Ui prefers option oi (utility 1), with second choice o0 (utility 1/2), and all the other options as third choice (utility 0), then the expected utility of the random dictator is 1/n for all Ui (as p∗i gives utility 1, and p∗j gives utility 0 for all j≠i), so the normalised weighted utility to maximise is:
U=1n−1(U1+U2+…Un).
Using (n−1)U (because scaling doesn’t change expected utility decisions), the utility of any oi, i>0, is 1, while the utility of o0 is n/2. So if n>2, the compromise option o0 will get chosen.
Don’t confuse the problems of the random dictator, with the problems of maximising the weighted sum of the normalisations that used the random dictator (and don’t confuse the other way, either; the random dictator is immune to players’ lying, this normalisation is not).
I was aware, but addressing his objection as though it were justified, which it would be if this were the only place where the agent’s preferences matter. This counterfactual is supported by my fondness for linear logic.