Richard_Kennaway comments on Has there been any work on attempting to use Pascal’s Mugging to make an AGI behave?

Richard_Kennaway 15 Jun 2022 9:56 UTC
9 points
0
Pascal’s Mugging is generally considered to be a failure of proper reasoning, although Eliezer admitted in 2007 to not having a solution to the problem, i.e. an argument demonstrating how a perfect reasoner would avoid being mugged.

If rejecting Pascalian muggers is the correct conclusion, then to get an excellent reasoner to submit to those cases of it that would be convenient for us, we would have to distort its reasoning to place a blind spot in the places where we don’t want it to go. This does not sound to me like a winning strategy for making safe AI. The argument that Eliezer gave in point 24 of his List of Lethalities applies here as well:

You’re trying to take a system implicitly trained on lots of arithmetic problems until its machinery started to reflect the common coherent core of arithmetic, and get it to say that as a special case 222 + 222 = 555.
- Chris_Leong 15 Jun 2022 10:19 UTC
  2 points
  0
  Parent
  I would suggest that Pascal’s Mugging is mostly a question of values (obv. there are facts about the world that are relevant too, so as what it would imply and how often people would try to exploit it) so I disagree with it being a “failure of proper reasoning”. Of course, someone may end up paying in Pascal’s mugging as a result of fallacious reasoning, but I disagree with fallacious reasoning be the only reason why someone might pay.
  - Richard_Kennaway 15 Jun 2022 10:56 UTC
    13 points
    0
    Parent
    Then we disagree. Taking Eliezer’s original example:
    
    “Give me five dollars, or I’ll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people.”
    
    I do not pay this individual. I consider it an error to pay this individual, no matter how the details are varied.
    
    My reasoning: If my strategy pays out in such situations, then anyone knowing this can take all of my wealth by saying the magic sentence to me. This is a losing strategy. It is as losing as wandering through bad neighbourhoods looking like a naive and wealthy tourist.
    
    Compare the situation in which one has discovered numbers like 3^^^^3 embedded in the laws of physics, and having practically testable consequences. Well then, that would just be how things are. Nature is not an agent strategically designing its fundamental laws in order to take something from us. But a Pascal’s Mugger is. Their offer cannot be considered in isolation from our own strategy of responding to such offers. A correct solution must be derived from thinking about adversarial games and TDT-like theories.
    
    The argument “Solomonoff probabilities of numbers decrease far slower than those numbers can increase”, which Eliezer did not have a refutation of in 2007, ignores the recursive relationship between players’ strategies.
    
    The argument “but can you be 3^^^3 sure of that?” fails, because my possible errors of reasoning could go in any direction. There is no reason to privilege the hypothesis that the mugger spoke the truth.
    - JBlack 15 Jun 2022 23:52 UTC
      2 points
      0
      Parent
      “Use your powers from outside the matrix to give yourself five dollars. You don’t need mine.”
    - Yitz 15 Jun 2022 17:55 UTC
      2 points
      0
      Parent
      Is it an overall adversarial environment if the mugging only takes place once, and you know it can only ever take place once?
      - Richard_Kennaway 17 Jun 2022 8:14 UTC
        4 points
        0
        Parent
        From the point of view of choosing strategies rather than individual actions, there is no such thing as “just once”.
      - JBlack 15 Jun 2022 23:51 UTC
        1 point
        0
        Parent
        Yes, quite obviously.
    - Chris_Leong 15 Jun 2022 10:59 UTC
      2 points
      0
      Parent
      I’d suggest that such an agent is just extremely risk-averse. On the other hand, there are agents that are extremely risk-loving and those people “feel crazy” to me and some proportion of them haven’t really thought through the risks, but others just have different values.
      - Richard_Kennaway 15 Jun 2022 12:17 UTC
        2 points
        0
        Parent
        I’m not clear what risk aversion has to do with it. I believe (but do not have a mathematical proof) that an agent that simply shuts up and multiplies (i.e. is risk-neutral), and properly accounts for the game theory, refuses to pay.
        Chris_Leong 15 Jun 2022 15:59 UTC
        2 points
        0
        Parent
        Isn’t the whole issue that shutting up and multiplying causes people to pay the mugger?
        Richard_Kennaway 15 Jun 2022 16:53 UTC
        4 points
        0
        Parent
        Shutting up and multiplying causes naive decision theorists to pay the mugger, just as naive decision-theoretic hitchhikers get left in the desert by drivers who can see that they won’t repay their help, and Omega can offer enormous amounts to naive decision theorists in Newcomb’s Problem and never have to pay.
        
        Choosing a strategy, if done properly, refuses to pay the mugger, as it refuses all other attempts at blackmail. Come to think of it, as Eliezer has argued somewhere (perhaps in the context of Roko’s basilisk) that the correct way to handle blackmail is to have the invariant strategy of not paying, and Pascal’s Mugging is an example of blackmail, the PM conundrum he posed in 2007 should be easily solved by his current self.
        
        Folk wisdom knows this. “Never play poker with strangers.” “Never take a strange bet from a stranger.” Damon Runyon gave a more colourful version.