Vladimir_Nesov comments on Rolf Nelson: How to deter a rogue AI by using your first-mover advantage

Vladimir_Nesov 21 Nov 2010 21:56 UTC
4 points

I’m willing to soak up a bit of ill conceived torture on behalf of the part of the team that matters.

Why don’t simulations matter? Uploads are as real people as real can be.
- JenniferRM 22 Nov 2010 2:19 UTC
  2 points
  Parent
  The issue isn’t “status as a morally worthwhile person” but strategic position.
  
  If there is a meaningful substrate/simulation dichotomy with all the normal hidden assumptions that go with this holding true (substrate supports the simulation with read/write editing powers over it, etc, etc) then the substrate is obviously the thing that matters because it actually does matter. Winning in one or another simulation but losing in the substrate is still losing because the simulation depends upon the substrate.
  
  You can close your eyes and imagine a pleasant situation and that might be pleasant for a while (if you don’t “notice the pixels”). But if your physical brain is in a materially bad position (lack of nutrients or physical trauma or some such) that you could have avoided by pausing your daydreams to spend some clock cycles on world modeling and planning, then not to have done so would be a tragedy.
  
  If I had the option (and freedom from resource constraints), I’d move my “wirehead happy time” into a simulation because I could fly and magically learn kung fu and generally have an amusing time. It would be just as “internally real” an experience if I’m focused on the experience alone… but until I’m floating in a limitless unconstrained choir of sexy robotic elohim, I want to make sure that someone is diligently minding the substrate :-P
  
  And I do not want my substrate-minding-sister to do something dumb like not achieve her goals (which I share) just because someone threatens to imagine her experiencing some pain. And she’ll know I can see through the dumb philosophy and not resent her for not rescuing me from coercive “voodoo torture” if I’m any kind of realistic copy of her :-)
  - Tyrrell_McAllister 22 Nov 2010 21:51 UTC
    3 points
    Parent
    
    The issue isn’t “status as a morally worthwhile person” but strategic position.
    
    It’s true that a sim and an original could agree that what happens to the original is all that matters. Such a sim will sacrifice itself willingly if doing so implies a greater chance that the original gets what it wants.
    
    But the essay in the OP stipulates what the RAI values, and these values mean that there is no strategic value to caring only about what the original gets. Both the sim and the original just want to see computation C completed. The completion of C is something that the RAI could complete in either possible world. Even if the RAI is a sim, its computation of C would be completely real. There would still be a real hardware substrate somewhere that ran through the computation of C, even if that hardware were simultaneously creating an AI who thinks that it’s 2040.
    
    This means that the possibility that the sim will be aborted really is a threat to something that the RAI, in both instantiations, values. So, the RAI ought to weigh this possibility.
    - JenniferRM 23 Nov 2010 3:04 UTC
      4 points
      Parent
      ::laughing with pleasure::
      
      Yes, in that particular contrived example the boundaries between daydream and real accomplishment are potentially blurred if the difficult accomplishment is to have successfully dreamed a particular thing.
      
      But while dreams within dreams are fun to play with, I don’t think that a coherent theory of simulationist metaphysics can ignore the fact that computation in a substrate is a physical process. Rolf’s RAI might have an incoherent theory of computation, but I suspect that any coherent theory is likely to take into account energy costs and computational reversibility and come out to some conclusion of the sort that “computing C” ultimately has a physical meaning something along the lines of either “storing the output of process C in a particular configuration in a particular medium” or perhaps “erasing information from a particular medium in a C-outcome-indicating manner”?
      
      If we simulate conway’s life running a Turing machine computing its calculation it seems reasonable that it would count that as the calculation happening in both the conway’s life simulation and in our computer, but to me this just highlights the enormous distance between Rolf’s simulated and real RAI.
      
      Maybe if the RAI was OK with the computational medium being in some other quantum narrative that branched off of its own many years previously then it would be amenable to inter-narrative trade after hearing from the ambassador of the version of Rolf who is actually capable of running the simulation? Basically it would have to be willing to say “I’ll spare you here at the expense of a less than optimal computational result if copies of you that won in other narratives run my computation for me over there.”
      
      But this feels like the corner case of a corner case to me in terms of robust solutions to rogue computers. The RAI would require a very specific sort of goal that’s amenable to a very specific sort of trade. And a straight trade does not require the RAI to be ignorant about whether it is really in control of the substrate universe or if it is in a simulated sandbox universe: it just requires (1) a setup where the trade is credible and (2) an RAI that actually counting very distant quantum narratives as “real enough to trade for”.
      
      Finally, actually propitiating every possible such RAIs is getting into busy beaver territory in terms of infeasible computational costs in the positive futures where a computationally focused paperclipping monster turns out not to have eaten the world.
      
      It would be kind of ironic if we end up with a positive singularity… and then end up spending all our resources simulating “everything” that could have happened in the disaster scenarios...
  - Perplexed 22 Nov 2010 5:17 UTC
    3 points
    Parent
    Let me see if I understand this. An agent cannot tell for sure whether she is real or a simulation. But she is about to find out. Because she now faces a decision which has different consequences if she is real. She must choose action C or D.
    
    If she chooses C and is real she loses 10 utils
    If she chooses D and is real she gains 50 utils
    If she chooses C and is a sim she gains 10 utils
    If she chooses D and is a sim she loses 100 utils
    
    She believes that there is a 50% chance she is a sim. She knows that the only reason the sim was created was to coerce the real agent into playing C. So what should she do?
    
    Your answer seems to be that she should systematically discount all simulated utilities. For example, she should do the math as if:
    
    If she chooses C and is a sim she gains 0.01 utils
    If she chooses D and is a sim she loses 0.1 utils
    
    That is, what happens in the simulation just isn’t as important as what happens in real life. The agent should maximize the sum (real utility + 0.001 * simulated utility).
    
    Note: the 50% probability and the 0.001 discounting factor were just pulled out of the air in this example.
    
    If this is what you are saying, then it is interesting that your suggestion (reality is more important than a sim) has some formal similarities to time discounting (now is more important than later) and also to Nash bargaining (powerful is more important than powerless).
    
    Cool!
    - JenniferRM 23 Nov 2010 1:52 UTC
      2 points
      Parent
      That actually might be a cooler thing than I said, but I appreciate your generous misinterpretation! I had to google for the nash bargaining game and I still don’t entirely understand your analogy there. If you could expand on that bit, I’d be interested :-)
      
      What I was trying to say was simply that there is a difference between something like “solipsistic benefits” and “accomplishment benefits”. Solipsistic benefits are unaffected by transposition into a perfect simulation despite the fact that someone in the substrate can change the contents of the simulation at whim. But so long as I’m living in a world full of real and meaningful constraints, within which I make tradeoffs and pursue limited goals, I find it generally more sane to pay attention to accomplishment benefits.
      - Perplexed 23 Nov 2010 4:02 UTC
        4 points
        Parent
        
        I had to google for the nash bargaining game and I still don’t entirely understand your analogy there. If you could expand on that bit, I’d be interested.
        
        There have been two recent postings on bargaining that are worth looking at. The important thing from my point of view is this:
        
        In any situation in which two individuals are playing an iterated game with common knowledge of each other’s payoffs, rational cooperative play calls for both players to pretend that they are playing a modified game with the same choices, but different payoffs. The payoff matrix they both should use will be some linear combination of the payoff matrices of the two players. The second linked posting above expresses this combination as “U1+µU2”. The factor µ acts as a kind of exchange rate, converting one players utility units into the “equivalent amount” of the other player’s utility units. Thus, the Nash bargaining solution is in some sense a utilitarian solution. Both players agree to try for the same objective, which is to maximize total utility.
        
        But wait! Those were scare quotes around “equivalent amount” in the previous paragraph. But I have not yet justified my claim that µ is in any sense a ‘fair’ (more scare quotes) exchange rate. Or, to make this point in a different way, I have not yet said what is ‘fair’ about the particular µ that arises in Nash bargaining. So here, without proof, is why this particular µ is fair. It is fair because it is a “market exchange rate”. It is the rate at which player1 utility gets changed into player2 utility if the bargained solution is shifted a little bit along the Pareto frontier.
        
        Hmmm. That didn’t come out as clear as I had hoped it would. Sorry about that. But my main point in the grandparent—the thing I thought was ‘cool’ - is that Nash bargaining, discounting of future utility, and discounting of simulations relative to reality are all handled by doing a linear combination of utilities which initially have different units (are not comparable without conversion).
      - Perplexed 23 Nov 2010 6:15 UTC
        2 points
        Parent
        Would I be far wrong if I referred to your ‘solipsistic benefits’ as “pleasures” and referred to your ‘accomplishment benefits’ as “power”? And, assuming I am on track so far, do I have it right that it is possible to manufacture fake pleasures (which you may as well grab, because they are as good as the ’real thing), but that fake power is worthless?
        JenniferRM 24 Nov 2010 4:49 UTC
        6 points
        Parent
        I would say that, as traditionally understood, raw pleasure is the closest thing we have to a clean solipsistic benefit and power is clearly an accomplishment benefit. But I wouldn’t expect either example to be identical with the conceptual categories because there are other things that also fit there. In the real world, I don’t think being a really good friend with someone is about power, but it seems like an accomplishment to me.
        
        But there is a deeper response available because a value’s status as a solipsistic or accomplishment benefit changes depending on the conception of simulation we use—when we imagine simulations that represent things more complex than a single agent in a physically stable context what counts as “accomplishment” can change.
        
        One of the critical features of a simulation (at least with “all the normal hidden assumptions” of simulation) is that a simulation’s elements are arbitrarily manipulable from the substrate that the simulation is implemented in. From the substrate you can just edit anything. You can change the laws of physics. You can implement a set of laws of physics, and defy them by re-editing particular structures in “impossible” ways at each timestep. Generally, direct tweaks to uploaded mind-like processes are assumed not to happen in thought experiments, but it doesn’t have to be this way (and a strong version of Descartes’s evil demon could probably edit neuron states to make you believe that circles have corners if it wanted). We can imagine the editing powers of the substrate yoked to an agent-like process embedded in the simulation and, basically, the agent would get “magic powers”.
        
        In Eliezer’s Thou Art Physics he asked “If the laws of physics did not control us, how could we possibly control ourselves? If we were not in reality, where could we be?”
        
        As near as I can tell, basically all human values appear to have come into existence in the face of a breathtakingly nuanced system of mechanical constraint and physical limitation. So depending on (1) what you mean by “power”, and (2) what a given context’s limits are, then power could also be nothing but a solipsistic benefit.
        
        Part of me is embarrassed to be talking about this stuff on a mere blog, but one of the deep philosophical problems with the singularity appears to be figuring out what the hell there is to do after the sexy robotic elohim pull us into their choir of post-scarcity femto-mechanical computronium. If it happens in a dumb way (or heck, maybe if it happens in the best possible way?) the singularity may be the end of the honest pursuit of accomplishment benefits. Forever.
        
        If pursuit of accomplishment benefits is genuinely good (as opposed to simply growing out of cognitive dissonance about entropy and stuff) then it’s probably important to know that before we try to push a button we suspect will take such pursuits away from us.
  - Vladimir_Nesov 22 Nov 2010 2:26 UTC
    0 points
    Parent
    You are essentially arguing about moral value of consequences of actions in reality and simulations. The technique gets its strength through application of sufficient moral value of consequences in simulations, sufficient as an incentive for altering actions in reality. The extent to which control in reality is negotiated is given by the extent of the conditional moral value of simulations. Factoring in low probability of the threat, not much can be bought in reality, but some.