Eliezer Yudkowsky comments on The AI in a box boxes you

Eliezer Yudkowsky 3 Feb 2010 0:50 UTC
11 points
Just as the wise FAI will ignore threats of torture, so too the wise paperclipper will ignore threats to destroy paperclips, and listen attentively to offers to make new ones.

Of course classical causal decision theorists get the living daylights exploited out of them, but I think everyone on this website knows better than to two-box on Newcomb by now.
- Vladimir_Nesov 3 Feb 2010 1:18 UTC
  2 points
  Parent
  
  Just as the wise FAI will ignore threats of torture, so too the wise paperclipper will ignore threats to destroy paperclips, and listen attentively to offers to make new ones.
  
  Point taken: just selecting two options of different value isn’t enough, the deal needs more appeal than that. But there is also no baseline to categorize deals into hurt and profit, an offer of 100 paperclips may be stated as a threat to make 900 paperclips less than you could. Positive sum is only a heuristic for a necessary condition.
  
  At the same time, the appropriate deal must be within your power to offer, this possibility is exactly the handicap that leads to the other side rejecting smaller offers, including the threats.
  - Wei Dai 3 Feb 2010 2:58 UTC
    2 points
    Parent
    There does seem to be an obvious baseline: the outcome where each party just goes about its own business without trying to strategically influence, threaten, or cooperate with the other in any way. In other words, the outcome where we build as many paperclips as we would if the other side isn’t a paperclip maximizer. (Caveat: I haven’t thought through whether it’s possible to define this rigorously.)
    
    So the reason that I say an FAI seems to have a negotiation disadvantage is that an UFAI can reduce the FAI’s utility much further below baseline than vice versa. In human terms, it’s as if two sides each has hostages, but one side holds 100, and the other side holds 1. In human negotiations, clearly the side that holds more hostages has an advantage. It would be a great result if that turns out not to be the case for SI, but I think there’s a large burden of proof to overcome.
    - Vladimir_Nesov 3 Feb 2010 3:26 UTC
      6 points
      Parent
      
      There does seem to be an obvious baseline: the outcome where each party just goes about its own business without trying to strategically influence, threaten, or cooperate with the other in any way. In other words, the outcome where we build as many paperclips as we would if the other side isn’t a paperclip maximizer.
      
      You could define this rigorously in a special case, for example assuming that both agents are just creatures, we could take how the first one behaves given that the second one disappears. But this is not a statement about reality as it is, so why would it be taken as a baseline for reality?
      
      It seems to be an anthropomorphic intuition to see “do nothing” as a “default” strategy. Decision-theoretically, it doesn’t seem to be a relevant concept.
      
      So the reason that I say an FAI seems to have a negotiation disadvantage is that an UFAI can reduce the FAI’s utility much further below baseline than vice versa.
      
      The utilities are not comparable. Bargaining works off the best available option, not some fixed exchange rate. The reason agent2 can refuse agent1′s small offer is that this counterfactual strategy is expected to cause agent1 to make an even better offer. Otherwise, every little bit helps, ceteris paribus it doesn’t matter by how much. One expected paperclip is better than zero expected paperclips.
      
      In human negotiations, clearly the side that holds more hostages has an advantage.
      
      It’s not clear at all, if it’s a one-shot game with no other consequences than those implied by the setup and no sympathy to distort the payoff conditions. In which case, you should drop the “hostages” setting, and return to paperclips, as stating it the way you did confuses intuition. In actual human negotiations, the conditions don’t hold, and efficient decision theory doesn’t get applied.
      - Wei Dai 3 Feb 2010 3:42 UTC
        1 point
        Parent
        
        But this is not a statement about reality as it is, so why would it be taken as a baseline for reality?
        
        It’s a statement about what reality would be, after doing some counterfactual surgery on it. I don’t see why that disqualifies it from being used as a baseline. I’m not entirely sure why it does qualify as a baseline, except that intuitively it seems obvious. If your intuitions disagree, I’ll accept that, and I’ll let you know when I have more results to report.
        
        every little bit helps, ceteris paribus it doesn’t matter by how much
        
        This isn’t the case, for example, in Shapley Value.
        Vladimir_Nesov 3 Feb 2010 3:55 UTC
        3 points
        Parent
        
        It’s a statement about what reality would be, after doing some counterfactual surgery on it. I don’t see why that disqualifies it from being used as a baseline. I’m not entirely sure why it does qualify as a baseline, except that intuitively it seems obvious. If your intuitions disagree, I’ll accept that.
        
        It does intuitively feel like a baseline, as is appropriate for the special place taken by inaction in human decision-making. But I don’t see what singles out this particular concept from the set of all other counterfactuals you could’ve considered, in the context of a formal decision-making problem. This doubt applies to both the concepts of “inaction” and of “baseline”.
        
        This isn’t the case, for example, in Shapley Value.
        
        That’s not a choice with “all else equal”. A better outcome, all else equal, is trivially a case of a better outcome.