Eliezer Yudkowsky comments on Simulate and Defer To More Rational Selves

Eliezer Yudkowsky 17 Sep 2014 18:11 UTC
20 points
Rational agents cannot be successfully blackmailed by other agents that simulate them accurately, and especially not by figments of their own imagination.
- skeptical_lurker 17 Sep 2014 21:10 UTC
  3 points
  Parent
  Are you implying that rational agents can be successfully blackmailed by other agents that simulate them inaccurately? (This does seem plausible to me, and is an interesting rare example of accurate knowlage posing a hazard.)
  - Armok_GoB 7 Oct 2014 22:20 UTC
    2 points
    Parent
    Well, that’s quite obvious. Just imagine the blackmailer is a really stupid human with a big gun that’d fall for blackmail in a variety of awful ways, and has a bad case of typical mind fallacy, and if anything goes other than their expectations they get angry and just shot them before thinking through the consequences.
    - skeptical_lurker 8 Oct 2014 2:24 UTC
      3 points
      Parent
      Its kinda obvious, but deeply counter-intuitive—I mean its a situation where stupidity is decisive advantage!
      - Lumifer 8 Oct 2014 14:57 UTC
        7 points
        Parent
        
        its a situation where stupidity is decisive advantage!
        
        Not quite stupidity—irrationality. And it is well-known that (credible) irrationality can be a big advantage in negotiations and other game theory scenarios. Essentially, if I’m irrational then you cannot simulate me accurately and cannot predict what I will do which means that your risk aversion pushes you towards safe choices which limit your downside at the cost of your upside. And if it’s a zero-sum game, I get this upside.
        
        Of course, I need to be credible in showing my irrationality.
        
        The reason such a strategy is not used more often is because (a) often there is the option to walk away which many people do when faced with an irrational counterparty; and (b) when two irrational counterparties meet, bad things happen :-)
        gjm 8 Oct 2014 23:41 UTC
        3 points
        Parent
        There are instances where (arguably) irrationality confers a big game-theoretic advantage even though you’re predictable.
        
        For instance, suppose you’re leading a nuclear superpower. If you can make it credibly clear that you really truly would be happy to launch World War Three if the other guys don’t back down, then they probably will. Not because they can’t predict your actions, but because they can.
        
        In this sort of case it’s either debatable whether it’s really irrationality, or debatable whether it’s really a game-theoretic advantage. If you can really be sure that the other guys will back down, then maybe it’s not irrationality because you never have to blow up the world. If you can’t, then maybe you don’t have a game-theoretic advantage after all because if you play this game often enough then the other guys call your bluff, you push the big red button, and everyone dies.
        
        [EDITED to add: I think this sort of case is nearer to the example discussed upthread than the sort where unpredictability is key.]
        Lumifer 9 Oct 2014 0:14 UTC
        0 points
        Parent
        
        For instance, suppose you’re leading a nuclear superpower. If you can make it credibly clear that you really truly would be happy to launch World War Three
        
        That’s more like sheer bloodymindedness X-) not irrationality.
        
        then the other guys call your bluff, you push the big red button, and everyone dies.
        
        Yeah, it’s called the game of chicken and that’s a slightly different thing.
- dankane 17 Sep 2014 19:14 UTC
  3 points
  Parent
  I think you mean that rational agents cannot be successfully blackmailed by others agents that for which it is common knowledge that the other agents can simulate them accurately and will only use blackmail if they predict it to be successful. All of this of course in the absence of mitigating circumstances (including for example the theoretical likelihood of other agents that reward you for counterfactualy giving into blackmail under these circumstances).
- Philip_W 16 Jun 2015 5:51 UTC
  1 point
  Parent
  That doesn’t seem true. How can the victim know for sure that the blackmailer is simulating them accurately or being rational?
  
  Suppose you get mugged in an alley by random thugs. Which of these outcomes seems most likely:
  1. You give them the money, they leave.
  2. You lecture them about counterfactual reasoning, they leave.
  3. You lecture them about counterfactual reasoning, they stab you.
  Any agent capable of appearing irrational to a rational agent can blackmail that rational agent. This decreases the probability of agents which appear irrational being irrational, but not necessarily to the point that you can dismiss them.
- Decius 18 Sep 2014 1:26 UTC
  1 point
  Parent
  Why not? Are rational agents generally immune to blackmail, or is it not strictly advantageous to be able to simulate another agent accurately?
  - Tintinnabulation 16 Dec 2014 20:42 UTC
    0 points
    Parent
    I think it basically comes to, if the rational agent recognizes that the rational thing to do is to NOT buckle under blackmail, regardless of what the rational agent simulating them threatens, then the blackmailer’s simulation of the blackmailee will also not respond to that pressure, and so it’s pointless to go to the effort of pressuring them in the first place. However, if the blackmailer is irrational, their simulation of the blackmailee will be irrational, and thus they will carry through with the threat. This means that the blackmailee’s simulation of the blackmailer as rational is itself inaccurate, as the simulation does not correspond to reality. If the blackmailee is irrational, their simulation of the blackmailer will be irrational, and thus they will concede to their demands. Yet, each party acts as if their simulation of the other was correct, until actual, photon-transmitted information about the world can impress itself into their cognitive function. So, no-one gets what they want. The best choice for a rational agent here is just to ignore the good professor. On the other hand, you can’t argue with results. And there’s a simulation of Quirrel s-quirreled away in your brain, whispering.
    - Decius 25 Dec 2014 6:02 UTC
      0 points
      Parent
      It looks like you are saying that both rational and irrational agents model competitors as behaving in the same way they do.
      
      Is that why you think that an irrational simulation of a rational agent must be wrong, and why a rational simulation of an irrational agent must be wrong? I suggest that an irrational agent can correctly model even a perfectly rational one.
- johnlawrenceaspden 23 Mar 2016 17:43 UTC
  0 points
  Parent
  sorry