Eliezer Yudkowsky comments on Cooperating with agents with different ideas of fairness, while resisting exploitation

Eliezer Yudkowsky 17 Sep 2013 20:34 UTC
6 points

Using Eliezer’s punishment solution instead of Stuart’s seems to be pure blackmail.

At a limit of sufficiently intelligent agents with perfect exchange of decision algorithm source code (utility-function source code not required) rational agents implementing Eliezer’s punishment-for-unfairness system will arrive at punishment factors approaching zero and the final decision will approach Stuart’s Pareto-dominant solution.

When there is mutual trust in the decision algorithms of the other agents or less trust in the communication process then a greater amount of punishment for unfairness is desirable.

My intuition is more along the lines of:

Suppose there’s a population of agents you might meet, and the two of you can only bargain by simultaneously stating two acceptable-bargain regions and then the Pareto-optimal point on the intersection of both regions is picked. I would intuitively expect this to be the result of two adapted Masquerade algorithms facing each other.

Most agents think the fair point is N and will refuse to go below unless you do worse, but some might accept an exploitive point of N’. The slope down from N has to be steep enough that having a few N’-accepting agents will not provide a sufficient incentive to skew your perfectly-fair point away from N, so that the global solution is stable. If there’s no cost to destroying value for all the N-agents, adding a single exploitable N’-agent will lead each bargaining agent to have an individual incentive to adopt this new N’-definition of fairness. But when two N’-agents meet (one reflected) their intersection destroys huge amounts of value. So the global equilibrium is not very Nash-stable.

Then I would expect this group argument to individualize over agents facing probability distributions of other agents.
- wanderingsoul 17 Sep 2013 23:57 UTC
  2 points
  Parent
  I’m not getting what you’re going for here. If these agents actually change their definition of fairness based on other agents definitions then they are trivially exploitable. Are there two separate behaviors here, you want unexploitability in a single encounter, but you still want these agents to be able to adapt their definition of “fairness” based on the population as a whole?
  - wedrifid 18 Sep 2013 3:26 UTC
    2 points
    Parent
    
    If these agents actually change their definition of fairness based on other agents definitions then they are trivially exploitable.
    
    I’m not sure that is trivial. What is trivial is that some kinds of willingness to change their definition of fairness makes them exploitable. However this doesn’t hold for all kinds of willingness to change fairness definition. Some agents may change their definition of fairness in their favour for the purpose of exploiting agents vulnerable to this tactic but not willing to change their definition of fairness when it harms them. The only ‘exploit’ here is ‘prevent them from exploiting me and force them to use their default definition of fair’.
    - wanderingsoul 18 Sep 2013 3:55 UTC
      1 point
      Parent
      Ah, that clears this up a bit. I think I just didn’t notice when N’ switched from representing an exploitive agent to an exploitable one. Either that, or I have a different association for exploitive agent than what EY intended. (namely, one which attempts to exploit)