wedrifid comments on Open Thread: February 2010, part 2

wedrifid 19 Feb 2010 2:14 UTC
0 points

You don’t consider someone cooperating and trustworthy if you know that its future plan is to turn you into paperclips.

Paperclip maximizers do cooperate in the single-shot PD. I am not sure I understand you, but I don’t think I care about single-shot.

I am not sure I understand you

It requires a certain amount of background in the more technical conception of ‘cooperation’ but the cornerstone of cooperation is doing things that benefit each other’s utility such that you each get more of what you want than if you had each tried to maximize without considering the other agent. I believe you are using ‘cooperation’ to describe a situation where the other agent can be expected to do at least some things that benefit you even without requiring any action on your part because you have similar goals.

but I don’t think I care about single-shot.

Single shot true prisoners dilemma is more or less the pinnacle of cooperation. Multiple shots just make it easier to cooperate. If you don’t care about single shot PM you may be sacrificing human lives. Strategy: “give him the paperclips if you think he’ll save the lives if and only if he expects you to give him the paperclips and you think he will guess your decision correctly”.
- DanielVarga 20 Feb 2010 3:32 UTC
  0 points
  Parent
  You are right, I used the word ‘cooperation’ in the informal sense of ‘does not want to destroy me’. I fully admit that it is hard to formalize this concept, but if it says noncooperating and the game theoretic definition says cooperating, I prefer my definition. :) A possible problem I see with this game theoretic framework is that in real life, the agents themselves set up the situation where cooperation/defect occurs. As an example: the PM navigates humanity into a PD situation where our minimal payoff is ‘all humans dead’ and our maximal payoff is ‘half of humanity dead’, and then it cooperates.
  
  I bumped into a question when I tried to make sense of all this. I have looked up the definition of PM at the wiki. The entry is quite nicely written, but I couldn’t find the answer to a very obvious question: How soon does the PM want to see results in its PMing project? There is no mention of time-based discounting. Can I assume that PMing is a very long-term project, where the PM has a set deadline, say, 10 billion years from now, and its actual utility function is the number of paperclips at the exact moment of the deadline?