Perplexed comments on Rolf Nelson: How to deter a rogue AI by using your first-mover advantage

Perplexed 22 Nov 2010 5:17 UTC
3 points
Let me see if I understand this. An agent cannot tell for sure whether she is real or a simulation. But she is about to find out. Because she now faces a decision which has different consequences if she is real. She must choose action C or D.
- If she chooses C and is real she loses 10 utils
- If she chooses D and is real she gains 50 utils
- If she chooses C and is a sim she gains 10 utils
- If she chooses D and is a sim she loses 100 utils
She believes that there is a 50% chance she is a sim. She knows that the only reason the sim was created was to coerce the real agent into playing C. So what should she do?

Your answer seems to be that she should systematically discount all simulated utilities. For example, she should do the math as if:
- If she chooses C and is a sim she gains 0.01 utils
- If she chooses D and is a sim she loses 0.1 utils
That is, what happens in the simulation just isn’t as important as what happens in real life. The agent should maximize the sum (real utility + 0.001 * simulated utility).

Note: the 50% probability and the 0.001 discounting factor were just pulled out of the air in this example.

If this is what you are saying, then it is interesting that your suggestion (reality is more important than a sim) has some formal similarities to time discounting (now is more important than later) and also to Nash bargaining (powerful is more important than powerless).

Cool!
- JenniferRM 23 Nov 2010 1:52 UTC
  2 points
  Parent
  That actually might be a cooler thing than I said, but I appreciate your generous misinterpretation! I had to google for the nash bargaining game and I still don’t entirely understand your analogy there. If you could expand on that bit, I’d be interested :-)
  
  What I was trying to say was simply that there is a difference between something like “solipsistic benefits” and “accomplishment benefits”. Solipsistic benefits are unaffected by transposition into a perfect simulation despite the fact that someone in the substrate can change the contents of the simulation at whim. But so long as I’m living in a world full of real and meaningful constraints, within which I make tradeoffs and pursue limited goals, I find it generally more sane to pay attention to accomplishment benefits.
  - Perplexed 23 Nov 2010 4:02 UTC
    4 points
    Parent
    
    I had to google for the nash bargaining game and I still don’t entirely understand your analogy there. If you could expand on that bit, I’d be interested.
    
    There have been two recent postings on bargaining that are worth looking at. The important thing from my point of view is this:
    
    In any situation in which two individuals are playing an iterated game with common knowledge of each other’s payoffs, rational cooperative play calls for both players to pretend that they are playing a modified game with the same choices, but different payoffs. The payoff matrix they both should use will be some linear combination of the payoff matrices of the two players. The second linked posting above expresses this combination as “U1+µU2”. The factor µ acts as a kind of exchange rate, converting one players utility units into the “equivalent amount” of the other player’s utility units. Thus, the Nash bargaining solution is in some sense a utilitarian solution. Both players agree to try for the same objective, which is to maximize total utility.
    
    But wait! Those were scare quotes around “equivalent amount” in the previous paragraph. But I have not yet justified my claim that µ is in any sense a ‘fair’ (more scare quotes) exchange rate. Or, to make this point in a different way, I have not yet said what is ‘fair’ about the particular µ that arises in Nash bargaining. So here, without proof, is why this particular µ is fair. It is fair because it is a “market exchange rate”. It is the rate at which player1 utility gets changed into player2 utility if the bargained solution is shifted a little bit along the Pareto frontier.
    
    Hmmm. That didn’t come out as clear as I had hoped it would. Sorry about that. But my main point in the grandparent—the thing I thought was ‘cool’ - is that Nash bargaining, discounting of future utility, and discounting of simulations relative to reality are all handled by doing a linear combination of utilities which initially have different units (are not comparable without conversion).
  - Perplexed 23 Nov 2010 6:15 UTC
    2 points
    Parent
    Would I be far wrong if I referred to your ‘solipsistic benefits’ as “pleasures” and referred to your ‘accomplishment benefits’ as “power”? And, assuming I am on track so far, do I have it right that it is possible to manufacture fake pleasures (which you may as well grab, because they are as good as the ’real thing), but that fake power is worthless?
    - JenniferRM 24 Nov 2010 4:49 UTC
      6 points
      Parent
      I would say that, as traditionally understood, raw pleasure is the closest thing we have to a clean solipsistic benefit and power is clearly an accomplishment benefit. But I wouldn’t expect either example to be identical with the conceptual categories because there are other things that also fit there. In the real world, I don’t think being a really good friend with someone is about power, but it seems like an accomplishment to me.
      
      But there is a deeper response available because a value’s status as a solipsistic or accomplishment benefit changes depending on the conception of simulation we use—when we imagine simulations that represent things more complex than a single agent in a physically stable context what counts as “accomplishment” can change.
      
      One of the critical features of a simulation (at least with “all the normal hidden assumptions” of simulation) is that a simulation’s elements are arbitrarily manipulable from the substrate that the simulation is implemented in. From the substrate you can just edit anything. You can change the laws of physics. You can implement a set of laws of physics, and defy them by re-editing particular structures in “impossible” ways at each timestep. Generally, direct tweaks to uploaded mind-like processes are assumed not to happen in thought experiments, but it doesn’t have to be this way (and a strong version of Descartes’s evil demon could probably edit neuron states to make you believe that circles have corners if it wanted). We can imagine the editing powers of the substrate yoked to an agent-like process embedded in the simulation and, basically, the agent would get “magic powers”.
      
      In Eliezer’s Thou Art Physics he asked “If the laws of physics did not control us, how could we possibly control ourselves? If we were not in reality, where could we be?”
      
      As near as I can tell, basically all human values appear to have come into existence in the face of a breathtakingly nuanced system of mechanical constraint and physical limitation. So depending on (1) what you mean by “power”, and (2) what a given context’s limits are, then power could also be nothing but a solipsistic benefit.
      
      Part of me is embarrassed to be talking about this stuff on a mere blog, but one of the deep philosophical problems with the singularity appears to be figuring out what the hell there is to do after the sexy robotic elohim pull us into their choir of post-scarcity femto-mechanical computronium. If it happens in a dumb way (or heck, maybe if it happens in the best possible way?) the singularity may be the end of the honest pursuit of accomplishment benefits. Forever.
      
      If pursuit of accomplishment benefits is genuinely good (as opposed to simply growing out of cognitive dissonance about entropy and stuff) then it’s probably important to know that before we try to push a button we suspect will take such pursuits away from us.