[deleted] comments on Open Thread, August 2010-- part 2

[deleted] 26 Aug 2010 5:39 UTC
3 points
I just read and liked “Pascal’s mugging.” It was written a few years ago, and the wiki is pretty spare. What’s the state of the art on this problem?
- gwern 26 Aug 2010 11:10 UTC
  5 points
  Parent
  I haven’t seen much response to it. There’s a reply in Analysis by Baumann who takes a cheap out by saying simply that one cannot provide the probability in advance, that it’s ‘extremely implausible’.
  
  I have an unfinished essay where I argue that as presented the problem is asking for a uniform distribution over an infinity, so you cannot give the probability in advance, but I haven’t yet come up with a convincing argument why you would want your probability to scale down in proportion as the mugger’s offer scales up.
  
  That is: it’s easy to show that scaling disproportionately leads to another mugging. If you scale superlinearly, then the mugging can be broken up into an ensemble of offers that add to a mugging. If you scale sublinearly, you will refuse sensible offers that are broken up.
  
  But I haven’t come up with any deeper justification for linearly scaling other than ‘this apparently arbitrary numeric procedure avoids 3 problems’. I’ve sort of given up on it, as you can see from the parlous state of my essay.
  - [deleted] 26 Aug 2010 15:28 UTC
    0 points
    Parent
    Thanks. Here’s my fresh and uneducated opinion.
    
    I see three kinds of answers to the mugging:
    
    We’re boned
    Some kind of hack in the decision-making process
    Some kind of hack in the mathematics
    “Head on.” That is, prove that the expected disutility of a given threat is bounded independent of the size of the threat.
    
    Here’s my analysis in the sense of 4., tell me if I’m making a common mistake. We are worried that P(agent can do H amount of harm | agent threatens to do H amount of harm) times H can be arbitrarily large. As Tarleton pointed out in the 2007 post, any details beyond H about the scenario we’re being threatened with is a distraction (right? That actually doesn’t seem to be the implicit assumption of your draft, or of Hanson’s comment, etc.)
    
    By Bayes the quantity in question is the same as
    
    P(threat | ability)/P(threat) x P(ability) x H
    
    Our hope is that we can prove this quantity is actually bounded independent of H (but of course not independent of the agent making the threat). I’ll leave aside the fact that the probability that such a proof contains a mistake is certainly bounded below.
    
    P(threaten H) is the probability that a certain computer program (the agent making the threat) will give a certain output (the threat). My feeling about this number is that it is medium sized if H has low complexity (such as 3^^^3) and tiny if H has high complexity (such as some of the numbers within 10% of 3^^^3). That is, complex threats have more credibility. I’m comforted by the fact that, by the definition of complexity, it would take a long time for an agent to articulate his complex threat. So let’s assume P(threaten H) is medium-sized, as in the original version where H = 3^^^3 x value of human not being tortured.
    
    It seems like wishful thinking that P(threat | ability) should shrink with H. Let’s assume this is also medium sized and does not depend on H.
    
    So I think the question boils down to how fast P(agent can do H amount of harm) shrinks with H. If it’s O(1/H) we’re OK, and if it’s larger we’re boned.
    - Pavitra 27 Aug 2010 5:48 UTC
      4 points
      Parent
      As long as we’re all chipping in, here’s my take:
      
      (1) Even if the correct answer is to hand over the money, we should expect to feel an intuitive sense that doing so is the wrong answer. A credible threat to inflict that much disutility would never have happened in the ancestral environment, but false threats to do so have happened rather often. That being the case, the following is probably rationalization rather than rationality:
      
      (2) Consider the proposition that, at some point in my life, someone will try to Pascal’s-mug me and actually back their threats up. In this case, I would still expect to receive a much larger number of false threats over the course of my lifetime. If I hand over all my money to the first mugger without proper verification, I won’t be able to pay up when the real threat comes around.
      - [deleted] 27 Aug 2010 6:16 UTC
        0 points
        Parent
        I think that your (2) is a proof that handing over the money is the wrong answer. My understand is that the problem is whether this means that any AI that runs on the basic package that we sometimes envision hazily—prior, (unbounded) utility function, algorithm for choosing based somehow on multiplying the former by the latter—is boned.
        Pavitra 27 Aug 2010 6:44 UTC
        2 points
        Parent
        I thought that my (2) was a proof that a prior-and-utility system will correctly decide to investigate the claim to see whether it’s credible.
        [deleted] 27 Aug 2010 20:15 UTC
        2 points
        Parent
        But what a prior-and-utility system means by “credible” is that the expected disutility is large. If a blackmailer can, at finite cost to itself, put our AI in a situation with arbitrarily high expected disutility, then our AI is boned.
        Pavitra 27 Aug 2010 20:25 UTC
        0 points
        Parent
        Ah, you’re worried about a blackmailer that can actually follow up on that threat. I would point out that humans usually pay ransoms, so it’s not exactly making a different decision than we would in the same situation.
        
        Or, the AI might anticipate the problem and self-modify in advance to never submit to threats.
        [deleted] 27 Aug 2010 20:37 UTC
        0 points
        Parent
        I’m worried about a blackmailer that can with positive probability follow up on that threat.
        
        Yes humans behave in the same way, at least according to economists. We pay ransoms when the probability of the threat being carried out, times the disutility that would result from the threat being carried out, is less than the ransom. The difference is that for human-scale threats, this expected disutility does seem to be bounded.
        
        The AI might anticipate the problem and self-modify to never submit to threats
        
        That could mean one of at least two things: either the AI starts to work according to the rules of a (hitherto not conceived?) non-prior-and-utility system. Or the AI calibrates its prior and its utility function so that it doesn’t submit to (some) threats. I think the question is whether something like the second idea can work.
        Pavitra 27 Aug 2010 20:51 UTC
        −1 points
        Parent
        No, see, that’s different.
        
        If you’re dealing with a blackmailer that might be able to carry out their threats, then you investigate whether they can or not. The blackmailer themselves might assist you with this, since it’s in their interest to show that their threat is credible.
        
        Allow me to demonstrate: Give $100 to the EFF or I’ll blow up the sun. Do you now assign a higher expected-value utility to giving $100 to the EFF, or to giving the same $100 instead to SIAI? If I blew up the moon as a warning shot, would that change your mind?
        [deleted] 27 Aug 2010 21:06 UTC
        0 points
        Parent
        
        If you’re dealing with a blackmailer that might be able to carry out their threats, then you investigate whether they can or not. The blackmailer themselves might assist you with this, since it’s in their interest to show that their threat is credible.
        
        The result of such an investigation might raise or lower P(threat can be carried out). This doesn’t change the shape of the question: can a blackmailer issue a threat with P(threat can be carried out) x U(threat is carried out) > H, for all H? Can it do so at cost to itself that is bounded independent of H?
        
        Allow me to demonstrate: Give $100 to the EFF or I’ll blow up the sun.
        
        I refuse. According to economists, I have just revealed a preference:
        
        P(Pavitra can blow up the sun) x U(Sun) < U(100$)
        
        If I blew up the moon as a warning shot, would that change your mind?
        
        Yes. Now I have revealed
        
        P(Pavitra can blow up the sun | Pavitra has blown up the moon) x U(Sun) > U(100$)
        Expand this thread
        Pavitra 27 Aug 2010 21:09 UTC
        2 points
        Parent
        
        I have just revealed a preference:
        
        P(Pavitra can blow up the sun) x U(Sun) < U(100$)
        
        My point is that U($100) is partially dependent on P(Mallory can blow up the sun) x U(Sun), for all values of Mallory and Sun such that Mallory is demanding $100 not to blow up the sun. If P(M_1 can heliocide) is large enough to matter, there’s a very good chance that P(M_2 can heliocide) is too. Credible threats do not occur in a vacuum.
        [deleted] 27 Aug 2010 21:13 UTC
        0 points
        Parent
        I don’t understand your points, can you expand them?
        
        In my inequalities, P and U denoted my subjective probabilities and utilities, in case that wasn’t clear.
        Pavitra 27 Aug 2010 21:33 UTC
        2 points
        Parent
        The fact that probability and utility are subjective was perfectly clear to me.
        
        I don’t know what else to say except to reiterate my original point, which I don’t feel you’re addressing:
        
        Consider the proposition that, at some point in my life, someone will try to Pascal’s-mug me and actually back their threats up. In this case, I would still expect to receive a much larger number of false threats over the course of my lifetime. If I hand over all my money to the first mugger without proper verification, I won’t be able to pay up when the real threat comes around.
        
        [deleted] 27 Aug 2010 22:18 UTC
        2 points
        Parent
        It’s not even clear to me that you disagree with me. I am proposing a formulation (not a solution!) of Pascal’s mugging problem: if a mugger can issue threats of arbitrarily high expected disutility, then a priors-and-utilities AI is boned. (A little more precisely: then the mugger can extract an arbitrarily large amount of utils from the P-and-U AI.) Are you saying that this statement is false, or just that it leaves out an essential aspect of Pascal’s mugging? Or something else?
        Pavitra 27 Aug 2010 23:03 UTC
        −2 points
        Parent
        I’m saying that this statement is false. The mugger needs also to somehow persuade the AI of the nonexistence of other muggers of similar credibility.
        
        In the real world, muggers usually accomplish this by raising their own credibility beyond the “omg i can blow up the sun” level, such as by brandishing a weapon.
        [deleted] 27 Aug 2010 23:22 UTC
        0 points
        Parent
        OK let me be a little more careful. The expected disutility the AI associates to a threat is
        
        EU(threat) = P(threat will be carried out) x U(threat will be carried out) + P(threat will not be carried out) x U(threat will not be carried out)
        
        I think that the existence of other muggers with bigger weapons, or just of other dangers and opportunities generally, is accounted for in the second summand.
        
        Now does the formulation look OK to you?
        Pavitra 28 Aug 2010 1:29 UTC
        −2 points
        Parent
        That formulation seems to fail to distinguish (ransom paid)&(threat not carried out) from (ransom not paid)&(threat not carried out).
        
        There are two courses of actions being considered: pay ransom or don’t pay ransom.
        
        EU(pay ransom) = P(no later real threat) * U(sun safe) + P(later real threat) * U(sun explodes)
        
        EU(don’t pay ransom) = P(threat fake) * ( P(no later real threat) + P(later real threat) * P(later real threat correctly identified as real | later real threat) ) * U(sun safe) + ( P(threat real) + P(threat fake) * P(later real threat) P(later real threat incorrectly identified as fake | later real threat) ) \ U(sun explodes)
        
        That’s completely unreadable. I need symbolic abbreviations.
        
        R=EU(pay ransom); r=EU(don’t pay ransom)
        
        S=U(sun safe); s=U(sun explodes)
        
        T=P(threat real); t=P(threat fake)
        
        L=P(later real threat); M=P(no later real threat)
        
        i=P(later real threat correctly identified as real | later real threat)
        
        j=P(later real threat incorrectly identified as fake | later real threat)
        
        Then:
        
        R = M*S + L*s
        
        r = t*(M + L*i)*S + (T + t*L*j)*s
        
        (p.s.: We really need a preview feature.)
        [deleted] 28 Aug 2010 2:38 UTC
        0 points
        Parent
        Why so much focus on future threats to the sun? Are you going to argue, by analogy with the prisoner’s dilemma, that the iterated Pascal’s mugging is easier to solve than the one-shot Pascal’s mugging?
        
        That formulation seems to fail to distinguish (ransom paid)&(threat not carried out) from (ransom not paid)&(threat not carried out).
        
        I thought that, either by definition or as a simplifying assumption, EU(ransom paid & threat not carried out) = current utility—size of ransom, and that EU(ransom not paid & threat not carried out) = current utility.
        Pavitra 28 Aug 2010 3:19 UTC
        0 points
        Parent
        My primary thesis is that the iterated Pascal’s mugging is much more likely to approximate any given real-world situation than the one-shot Pascal’s mugging, and that focusing on the latter is likely to lead by availability heuristic bias to people making bad decisions on important issues.
        [deleted] 28 Aug 2010 3:37 UTC
        2 points
        Parent
        My primary thesis is that if you have programmed a purported god-like and friendly AI that you know will do poorly in one-shot Pascal’s mugging, then you should not turn it on. Even if you know it will do well in other variations on Pascal’s mugging.
        
        My secondary thesis comes from Polya: “If there’s a problem that you can’t solve, then there’s a simpler problem that you can solve. Find it!” Solutions to, failed solutions to, and ideas about one-shot Pascal’s mugging will illuminate features about iterated Pascal’s mugging and also about many given real-world situations.
        
        (“One-shot”, “iterated”...If these are even good names!)
        Pavitra 28 Aug 2010 3:44 UTC
        0 points
        Parent
        I’m not persuaded that paying the ransom is doing poorly on the one-shot. And if it predictably does the wrong thing, in what sense is it Friendly?
        [deleted] 28 Aug 2010 3:52 UTC
        4 points
        Parent
        Forget it. I’m just weirded out that you would respond to “here’s a tentative formalization of a simple version of Pascal’s mugging” with “even thinking about it is dangerous.” I don’t agree and I don’t understand the mindset.
        Pavitra 28 Aug 2010 4:08 UTC
        1 point
        Parent
        I don’t mean to say that thinking about the one-shot is dangerous, only that grossly overemphasizing it relative to the iterated might be.
        
        I hear about the one-shot all the time, and the iterated not at all, and I think the iterated is more likely to come up than the one-shot, and I think the iterated is easier to solve than the one-shot, so in all I think it’s completely reasonable for me to want to emphasize the iterated.
        [deleted] 28 Aug 2010 4:15 UTC
        0 points
        Parent
        Granted! And
        
        I think the iterated is easier to solve than the one-shot
        
        tell me more.
        Pavitra 28 Aug 2010 4:22 UTC
        0 points
        Parent
        The iterated has an easy-to-accept-intuitively solution: don’t just randomly accept blackmail from anyone who offers it, but rather investigate first to see if they constitute a credible threat.
        
        The one-shot Pascal’s Mugging, like most one-shot games discussed in game theory, has a harder-to-stomach dominant strategy: pay the ransom, because the mere claim, considered as Bayesian evidence, promotes the threat to much more likely than the reciprocal of its utility-magnitude.
    - gwern 27 Aug 2010 4:57 UTC
      1 point
      Parent
      
      That is, complex threats have more credibility.
      
      I don’t quite follow this. Assuming we’re using one of the universal priors based on Turing machine enumerations, then an agent which consists of 3^^^3threat+noability is much shorter and much more likely than an agent which consists of ~.10*3^^^3threat+ability. The more complex the threat, the less space there is for executing it.
      - [deleted] 27 Aug 2010 5:30 UTC
        0 points
        Parent
        If I disagree, it’s for a very minor reason, and with only a little confidence. (P(threat) is short for P(threat|no information about ability).) But you’re saying the case for P(threaten H) being bounded below (and its reciprocal being bounded above) is even stronger than I thought, right?
        
        Another way to argue that P(threaten H) should be medium-sized: at least in real life, muggings have a time-limit. There are finitely many threats of a hundred words or less, and so our prior probability that we will one day receive such a threat is bounded below.
        
        Another way to argue that the real issue is P(ability H): our AI might single you out and compute P(gwern will do H harm) = P(gwern will do H harm | gwern can do H harm) x P(gwern can do H harm). It seems like you have an interest in convincing the AI that P(gwern can do H harm) x H is bounded above.
- gwern 28 Aug 2010 18:07 UTC
  1 point
  Parent
  While raking, I think I finally thought of a proof that the before-offer-probability can’t be known.
  
  The question is basically ‘what fraction of all Turing machines making an offer (which is accepted) will then output a certain result?’
  
  We could rewrite this as ’what is the probability that a random Turing machine will output a certain result?
  
  We could then devise a rewriting of all those Turing machines into Turing machines that halt or not when their offer is accepted (eg. halting might = delivering, not halting = welshing on the deal. This is like Rice’s theorem).
  
  Now we are asking ‘what fraction of all these Turing machines will halt?’
  
  However, this is asking ‘what is Chaitin’s constant for this rewritten set of Turing machines?’ and that is uncomputable!
  
  Since Turing machine-based agents are a subset of all agents that might try to employ Pascal’s Mugging (even if we won’t grant that agents must be Turing machines), the probability is at least partially uncomputable. A decision procedure which entails uncomputability is unacceptable, so we reject giving the probability in advance, and so our probability must be contingent on the offer’s details (like its payoff).
  
  Thoughts?
  - Wei Dai 28 Aug 2010 20:27 UTC
    7 points
    Parent
    I think Nesov is right, you’ve basically (re)discovered that the universal prior is uncomputable and thought that this result is related to Pascal’s Mugging because you made the discovery while thinking about Pascal’s Mugging. Pascal’s Mugging seems to be more about the utility function having to be bounded in some way.
    
    You might be interested in this thread, where I talked about how a computable decision process might be able to use an uncomputable prior:
    
    http://groups.google.com/group/one-logic/browse_frm/thread/b499a90ef9e5fd84/2193ca2c204a55d8?#2193ca2c204a55d8
  - Vladimir_Nesov 28 Aug 2010 18:36 UTC
    4 points
    Parent
    It seems to be an argument against possibility of making any decision, and hence not a valid argument about this particular decision. Under the same assumptions, you could in principle formalize any situation in this way. (The problem boils down to uncomputability of universal prior itself.)
    
    Besides, not making the decision is not an option, so you have to fall down to some default decision when you don’t know how to choose, but where does this default come from?
    - gwern 12 Oct 2010 20:29 UTC
      0 points
      Parent
      I take it as an argument against making perfect decisions. If perfection is uncomputable, then any computable agent is not perfect in some way.
      
      The question is what imperfection do we want our agent to have? This might be the deep justification for choosing to scale probability by utility that I was looking for. Scaling linearly corresponds to being willing to lose a fixed amount to mugging, scaling superlinearly correspond to not willing to lose any genuine offer, scaling sublinearly corresponds to not being willing to ever be fooled. Or something like that. The details need some work.
  - Perplexed 28 Aug 2010 18:21 UTC
    2 points
    Parent
    In order to make a decision, we do not always need an exact probability: sometimes just knowing that a probability is less than, say, 0.5 is enough to determine the correct decision. So, even though an exact probability p may be incomputable, that doesn’t mean that the truth value of the statement “p<0.1” can not be computed (for some particular case). And that computation may be all we need.
    
    That said, I’m not sure exactly how to interpret “A decision procedure which entails uncomputability is unacceptable.” Unacceptable to whom? Do decision procedures have to be deterministic? To be algorithms? To be recursive? To be guaranteed to terminate in a finite time. To be guaranteed to terminate in a bounded time? To be guaranteed to terminate by the deadline for making a decision?
    - gwern 28 Aug 2010 18:40 UTC
      0 points
      Parent
      
      In order to make a decision, we do not always need an exact probability: sometimes just knowing that a probability is less than, say, 0.5 is enough to determine the correct decision.
      
      Alright, so you compute away and determine that the upper bound on Chaitin’s constant for your needed formalism is 0.01. The mugger than multiplies his offering by 100, and proceeds to mug you, no? (After all, you don’t know that the right probability isn’t 0.01 and actually some smaller number.)
      
      That said, I’m not sure exactly how to interpret “A decision procedure which entails uncomputability is unacceptable.”
      
      This is pretty intuitive to me—a decision procedure which cannot be computed cannot make decisions, and a decision procedure which cannot make decisions cannot do anything. I mean, do you have any reason to think that the optimal, correct, decision theory is uncomputable?
      - Perplexed 28 Aug 2010 19:16 UTC
        1 point
        Parent
        I have no idea whether we are even talking about the same problem. (Probably not, since my thinking did not arise from raking). But you do seem to be suggesting that the multiplication by 100 does not alter the upper bound on the probability. As I read the wiki article on “Pascal’s Mugging”, Robin Hanson suggests that it does. Assuming, of course, that by “his offering” you mean the amount of disutility he threatens. And the multiplication by 100 does also affect the number (in this example 0.01) which I need to know whether p is less than. Which strikes me as the real point.
        
        This whole subject seems bizarre to me. Are we assuming that this mugger has Omega-like psy powers? Why? If not, how does my upper bound calculation and its timing have an effect on his “offer”? I seem to have walked into the middle of a conversation with no way from the context to guess what went before.