Conclusion: Stuart’s solution is flawed because it fails to blackmail pirates appropriately.
Thoughts:
Eliezer’s solution matched my intuitions for how negotiation feels like it ‘should’ work.
Analyzing Stuart’s solution and accompanying diagram changed my mind.
Stuart’s solution does Pareto-dominate Eliezer’s.
There is no incentive for either player to deviate from Stuart’s solution.
Unfortunately, ‘no incentive to deviate’ is not sufficient for creating stable compliance even among perfectly rational agents, let alone even slightly noisy agents.
When the other agent receives an identical payoff for giving me low utility as it does for giving me high utility then the expected behaviour of a rational opponent is approximately undefined. It’s entirely arbitrary.
A sane best practice would be to assume that of all outcomes with equal utility (to them) the other agent will probably choose the action that screws me over the most.
At very best we could say that we are granting the other agent the power to punish me for free on a whim—for most instrumental purposes this is a bad thing.
Consider a decision algorithm that, when evaluating the desirability of outcomes, first sorts by utility and then reverse-sorts by utility-for-other. In honour of the Pirate game I will call agents implementing that algorithm “pirates”. (The most alternative name would be ‘assholes’.)
Pirates are rational agents in the same sense as usually used for game theory purposes. They simply have defined behaviour in the place where ‘rational’ was previously undefined.
Eliezer’s prescribed negative incentive for each degree of departure from ‘fair’ ensures that pirates behave themselves, even if the punishment factor is tiny.
Eliezer’s punishment policy also applies (and is necessary) when dealing with what we could call “petty sadists”. That is, for agents which actually have utility functions with a small negative term for the utility granted to the other.
Usually considering things like petty sadism and ‘pirates’ is beyond the scope of a decision theory problem and it would be inappropriate to mention them. But when a proposed solution offers literally zero incentive to granting the payoff then these considerations become relevant. Even the slightest amount of noise in an agent, the communication or a utility function can flip the behaviour about. “Epsilon” stops being negligible when you try comparing it to ‘zero’.
Using Eliezer’s punishment solution instead of Stuart’s seems to be pure blackmail.
While many cases of blackmail I reject with unshakable stubbornness I think one of the most clear exceptions is the case where complying costs me nothing at all and the blackmail cost nothing or next-to nothing for the blackmailer.
At a limit of sufficiently intelligent agents with perfect exchange of decision algorithm source code (utility-function source code not required) rational agents implementing Eliezer’s punishment-for-unfairness system will arrive at punishment factors approaching zero and the final decision will approach Stuart’s Pareto-dominant solution.
When there is mutual trust in the decision algorithms of the other agents or less trust in the communication process then a greater amount of punishment for unfairness is desirable.
Punishing unfairness is the ‘training wheels’ of cooperation between agents with different ideas of fairness.
Using Eliezer’s punishment solution instead of Stuart’s seems to be pure blackmail.
At a limit of sufficiently intelligent agents with perfect exchange of decision algorithm source code (utility-function source code not required) rational agents implementing Eliezer’s punishment-for-unfairness system will arrive at punishment factors approaching zero and the final decision will approach Stuart’s Pareto-dominant solution.
When there is mutual trust in the decision algorithms of the other agents or less trust in the communication process then a greater amount of punishment for unfairness is desirable.
My intuition is more along the lines of:
Suppose there’s a population of agents you might meet, and the two of you can only bargain by simultaneously stating two acceptable-bargain regions and then the Pareto-optimal point on the intersection of both regions is picked. I would intuitively expect this to be the result of two adapted Masquerade algorithms facing each other.
Most agents think the fair point is N and will refuse to go below unless you do worse, but some might accept an exploitive point of N’. The slope down from N has to be steep enough that having a few N’-accepting agents will not provide a sufficient incentive to skew your perfectly-fair point away from N, so that the global solution is stable. If there’s no cost to destroying value for all the N-agents, adding a single exploitable N’-agent will lead each bargaining agent to have an individual incentive to adopt this new N’-definition of fairness. But when two N’-agents meet (one reflected) their intersection destroys huge amounts of value. So the global equilibrium is not very Nash-stable.
Then I would expect this group argument to individualize over agents facing probability distributions of other agents.
I’m not getting what you’re going for here. If these agents actually change their definition of fairness based on other agents definitions then they are trivially exploitable. Are there two separate behaviors here, you want unexploitability in a single encounter, but you still want these agents to be able to adapt their definition of “fairness” based on the population as a whole?
If these agents actually change their definition of fairness based on other agents definitions then they are trivially exploitable.
I’m not sure that is trivial. What is trivial is that some kinds of willingness to change their definition of fairness makes them exploitable. However this doesn’t hold for all kinds of willingness to change fairness definition. Some agents may change their definition of fairness in their favour for the purpose of exploiting agents vulnerable to this tactic but not willing to change their definition of fairness when it harms them. The only ‘exploit’ here is ‘prevent them from exploiting me and force them to use their default definition of fair’.
Ah, that clears this up a bit. I think I just didn’t notice when N’ switched from representing an exploitive agent to an exploitable one. Either that, or I have a different association for exploitive agent than what EY intended. (namely, one which attempts to exploit)
Conclusion: Stuart’s solution is flawed because it fails to blackmail pirates appropriately.
Thoughts:
Eliezer’s solution matched my intuitions for how negotiation feels like it ‘should’ work.
Analyzing Stuart’s solution and accompanying diagram changed my mind.
Stuart’s solution does Pareto-dominate Eliezer’s.
There is no incentive for either player to deviate from Stuart’s solution.
Unfortunately, ‘no incentive to deviate’ is not sufficient for creating stable compliance even among perfectly rational agents, let alone even slightly noisy agents.
When the other agent receives an identical payoff for giving me low utility as it does for giving me high utility then the expected behaviour of a rational opponent is approximately undefined. It’s entirely arbitrary.
A sane best practice would be to assume that of all outcomes with equal utility (to them) the other agent will probably choose the action that screws me over the most.
At very best we could say that we are granting the other agent the power to punish me for free on a whim—for most instrumental purposes this is a bad thing.
Consider a decision algorithm that, when evaluating the desirability of outcomes, first sorts by utility and then reverse-sorts by utility-for-other. In honour of the Pirate game I will call agents implementing that algorithm “pirates”. (The most alternative name would be ‘assholes’.)
Pirates are rational agents in the same sense as usually used for game theory purposes. They simply have defined behaviour in the place where ‘rational’ was previously undefined.
Eliezer’s prescribed negative incentive for each degree of departure from ‘fair’ ensures that pirates behave themselves, even if the punishment factor is tiny.
Eliezer’s punishment policy also applies (and is necessary) when dealing with what we could call “petty sadists”. That is, for agents which actually have utility functions with a small negative term for the utility granted to the other.
Usually considering things like petty sadism and ‘pirates’ is beyond the scope of a decision theory problem and it would be inappropriate to mention them. But when a proposed solution offers literally zero incentive to granting the payoff then these considerations become relevant. Even the slightest amount of noise in an agent, the communication or a utility function can flip the behaviour about. “Epsilon” stops being negligible when you try comparing it to ‘zero’.
Using Eliezer’s punishment solution instead of Stuart’s seems to be pure blackmail.
While many cases of blackmail I reject with unshakable stubbornness I think one of the most clear exceptions is the case where complying costs me nothing at all and the blackmail cost nothing or next-to nothing for the blackmailer.
At a limit of sufficiently intelligent agents with perfect exchange of decision algorithm source code (utility-function source code not required) rational agents implementing Eliezer’s punishment-for-unfairness system will arrive at punishment factors approaching zero and the final decision will approach Stuart’s Pareto-dominant solution.
When there is mutual trust in the decision algorithms of the other agents or less trust in the communication process then a greater amount of punishment for unfairness is desirable.
Punishing unfairness is the ‘training wheels’ of cooperation between agents with different ideas of fairness.
My intuition is more along the lines of:
Suppose there’s a population of agents you might meet, and the two of you can only bargain by simultaneously stating two acceptable-bargain regions and then the Pareto-optimal point on the intersection of both regions is picked. I would intuitively expect this to be the result of two adapted Masquerade algorithms facing each other.
Most agents think the fair point is N and will refuse to go below unless you do worse, but some might accept an exploitive point of N’. The slope down from N has to be steep enough that having a few N’-accepting agents will not provide a sufficient incentive to skew your perfectly-fair point away from N, so that the global solution is stable. If there’s no cost to destroying value for all the N-agents, adding a single exploitable N’-agent will lead each bargaining agent to have an individual incentive to adopt this new N’-definition of fairness. But when two N’-agents meet (one reflected) their intersection destroys huge amounts of value. So the global equilibrium is not very Nash-stable.
Then I would expect this group argument to individualize over agents facing probability distributions of other agents.
I’m not getting what you’re going for here. If these agents actually change their definition of fairness based on other agents definitions then they are trivially exploitable. Are there two separate behaviors here, you want unexploitability in a single encounter, but you still want these agents to be able to adapt their definition of “fairness” based on the population as a whole?
I’m not sure that is trivial. What is trivial is that some kinds of willingness to change their definition of fairness makes them exploitable. However this doesn’t hold for all kinds of willingness to change fairness definition. Some agents may change their definition of fairness in their favour for the purpose of exploiting agents vulnerable to this tactic but not willing to change their definition of fairness when it harms them. The only ‘exploit’ here is ‘prevent them from exploiting me and force them to use their default definition of fair’.
Ah, that clears this up a bit. I think I just didn’t notice when N’ switched from representing an exploitive agent to an exploitable one. Either that, or I have a different association for exploitive agent than what EY intended. (namely, one which attempts to exploit)