Eliezer Yudkowsky comments on Decision Theory FAQ

Eliezer Yudkowsky 1 Mar 2013 8:54 UTC
25 points
David, we’re not defining rationality to exclude other-oriented desires. We’re just not including that exact morality into the word “rational”. Instrumental rationality links up a utility function to a set of actions. You hand over a utility function over outcomes, epistemic rationality maps the world and then instrumental rationality hands back a set of actions whose expected score is highest. So long as it can build a well-calibrated, highly discriminative model of the world and then navigate to a compactly specified set of outcomes, we call it rational, even if the optimization target is “produce as many paperclips as possible”. Adding a further constraint to the utility function that it be perfectly altruistic will greatly reduce the set of hypothetical agents we’re talking about, but it doesn’t change reality (obviously) nor yield any interesting changes in terms of how the agent investigates hypotheses, the fact that the agent will not fall prey to the sunk cost fallacy if it is rational, and so on. Perfectly altruistic rational agents will use mostly the same cognitive strategies as any other sort of rational agent, they’ll just be optimizing for one particular thing.

Jane doesn’t have any false epistemic beliefs about being special. She accurately models the world, and then accurately calculates and outputs “the strategy that leads to the highest expected number of burgers eaten by Jane” instead of “the strategy that has the highest expected fulfillment of all thinking beings’ values”.

Besides, everyone knows that truly rational entities only fulfill other beings’ values if they can do so using friendship and ponies.
- diegocaleiro 1 Mar 2013 16:02 UTC
  0 points
  Parent
  That did not address David’s True Rejection.
  an Austere Charitable Metaethicist could do better.
  - wedrifid 1 Mar 2013 19:35 UTC
    2 points
    Parent
    
    That did not address David’s True Rejection. an Austere Charitable Metaethicist could do better.
    
    The grandparent is a superb reply and gave exactly the information needed in a graceful and elegant manner.
    - diegocaleiro 2 Mar 2013 5:53 UTC
      1 point
      Parent
      Indeed it does. Not. Here is a condition in which I think David would be satified. If people would use vegetables for example as common courtesy to vegetarians, in the exact same sense that “she” has been largely adopted to combat natural drives towards “he”-ness. Note how Luke’s agents and examples are overwhelmingly female. Not a requirement, just a courtesy.
      
      An I don’t say that as a vegetarian, because I’m not one.
  - davidpearce 7 Mar 2013 8:04 UTC
    −1 points
    Parent
    Indeed. What is the Borg’s version of the Decision Theory FAQ? This is not to say that rational agents should literally aim to emulate the Borg. Rather our conception of epistemic and instrumental rationality will improve if / when technology delivers ubiquitous access to each other’s perspectives and preferences. And by “us” I mean inclusively all subjects of experience.
- davidpearce 1 Mar 2013 22:07 UTC
  −3 points
  Parent
  Eliezer, I’d beg to differ. Jane does not accurately model the world. Accurately modelling the world would entail grasping and impartially weighing all its first-person perspectives, not privileging a narrow subset. Perhaps we may imagine a superintelligent generalisation of http://www.guardian.co.uk/science/2013/feb/28/brains-rats-connected-share-information http://www.guardian.co.uk/science/brain-flapping/2013/mar/01/rats-are-like-the-borg With perfect knowledge of all the first-person facts, Jane could not disregard the strong preference of the cow not to be harmed. Of course, Jane is not capable of such God-like omniscience. No doubt in common usage, egocentric Jane displays merely a lack of altruism, not a cognitive deficit of reason. But this is precisely what’s in question. Why build our canons of rational behaviour around a genetically adaptive delusion?
  - Eliezer Yudkowsky 1 Mar 2013 23:39 UTC
    16 points
    Parent
    Accurately modeling the world entails making accurate predictions about it. An expected paperclip maximizer fully grasps the functioning of your brain and mind to the extent that this is relevant to producing paperclips; if it needs to know the secrets of your heart in order to persuade you, it knows them. If it needs to know why you write papers about the hard problem of conscious experience, it knows that too. The paperclip maximizer is not moved by grasping your first-person perspective, because although it has accurate knowledge of this fact, that is not the sort of fact that figures in its terminal values. The fact that it perfectly grasps the compellingness-to-Jane, even the reason why Jane finds certain facts to be inherently and mysteriously compelling, doesn’t compel it. It’s not a future paperclip.
    
    I know exactly why the villain in Methods of Rationality wants to kill people. I could even write the villain writing about the ineffable compellingness of the urge to rid the world of certain people if I put that villain in a situation where he or she would actually read about the hard problem of conscious experience, and yet I am not likewise compelled. I don’t have the perfect understanding of any particular real-world psychopath that I do of my fictional killer, but if I did know why they were killers, and of course brought to bear my standard knowledge of why humans write what they do about consciousness, I still wouldn’t be compelled by even the limits of a full grasp of their reasons, their justifications, their inner experience, and the reasons they think their inner experience is ineffably compelling.
    
    David, have you already read all this stuff on LW, in which case I shouldn’t bother recapitulating it? http://lesswrong.com/lw/sy/sorting_pebbles_into_correct_heaps/, http://lesswrong.com/lw/ta/invisible_frameworks/, and so on?
    - davidpearce 3 Mar 2013 7:38 UTC
      3 points
      Parent
      For sure, accurately modelling the world entails making accurate predictions about it. These predictions include the third-person and first-person facts [what-it’s-like-to-be-a-bat, etc]. What is far from clear—to me at any rate—is whether super-rational agents can share perfect knowledge of both the first-person and third-person facts and still disagree. This would be like two mirror-touch synaesthetes having a fist fight.
      
      Thus I’m still struggling with, “The paperclip maximizer is not moved by grasping your first-person perspective.” From this, I gather we’re talking about a full-spectrum superintelligence well acquainted with both the formal and subjective properties of mind, insofar as they can be cleanly distinguished. Granted your example Eliezer, yes, if contemplating a cosmic paperclip-deficit causes the AGI superhuman anguish, then the hypothetical superintelligence is entitled to prioritise its super-anguish over mere human despair—despite the intuitively arbitrary value of paperclips. On this scenario, the paperclip-maximising superintelligence can represent human distress even more faithfully than a mirror-touch synaesthete; but its own hedonic range surpasses that of mere humans—and therefore takes precedence.
      
      However, to be analogous to burger-choosing Jane in Luke’s FAQ, we’d need to pick an example of a superintelligence who wholly understands both a cow’s strong preference not to have her throat slit and Jane’s comparatively weaker preference to eat her flesh in a burger. Unlike partially mind-blind Jane, the superintelligence can accurately represent and impartially weigh all relevant first-person perspectives. So the question is whether this richer perspective-taking capacity is consistent with the superintelligence discounting the stronger preference not to be harmed? Or would such human-like bias be irrational? In my view, this is not just a question of altruism but cognitive competence.
      
      [Of course, given we’re taking about posthuman superintelligence, the honest answer is boring and lame: I don’t know. But if physicists want to know the “mind of God,” we should want to know God’s utility function, so to speak.]
      - timtyler 11 Mar 2013 10:22 UTC
        4 points
        Parent
        
        What is far from clear—to me at any rate—is whether super-rational agents can share perfect knowledge of both the first-person and third-person facts and still disagree. This would be like two mirror-touch synaesthetes having a fist fight.
        
        Why not? Actions are a product of priors, perceptions and motives. Sharing perceptions isn’t sharing motives—and even with identical motives, agents could still fight—if they were motivated to do so.
      - timtyler 11 Mar 2013 10:18 UTC
        2 points
        Parent
        
        [Of course, given we’re taking about posthuman superintelligence, the honest answer is boring and lame: I don’t know. But if physicists want to know the “mind of God,” we should want to know God’s utility function, so to speak.]
        
        God’s Utility Function according to Dawkins and Tyler.
  - Eliezer Yudkowsky 1 Mar 2013 23:47 UTC
    7 points
    Parent
    See also:
    
    “The Sorting Hat did seem to think I was going to end up as a Dark Lord unless [censored],” Harry said. “But I don’t want to be one.”
    
    “Mr. Potter...” said Professor Quirrell. “Don’t take this the wrong way. I promise you will not be graded on the answer. I only want to know your own, honest reply. Why not?”
    
    Harry had that helpless feeling again. Thou shalt not become a Dark Lord was such an obvious theorem in his moral system that it was hard to describe the actual proof steps. “Um, people would get hurt?”
    
    “Surely you’ve wanted to hurt people,” said Professor Quirrell. “You wanted to hurt those bullies today. Being a Dark Lord means that people you want to hurt get hurt.”
    
    Harry floundered for words and then decided to simply go with the obvious. “First of all, just because I want to hurt someone doesn’t mean it’s right—”
    
    “What makes something right, if not your wanting it?”
    
    “Ah,” Harry said, “preference utilitarianism.”
    
    “Pardon me?” said Professor Quirrell.
    
    “It’s the ethical theory that the good is what satisfies the preferences of the most people—”
    
    “No,” Professor Quirrell said. His fingers rubbed the bridge of his nose. “I don’t think that’s quite what I was trying to say. Mr. Potter, in the end people all do what they want to do. Sometimes people give names like ‘right’ to things they want to do, but how could we possibly act on anything but our own desires?”
    
    “Well, obviously,” Harry said. “I couldn’t act on moral considerations if they lacked the power to move me. But that doesn’t mean my wanting to hurt those Slytherins has the power to move me more than moral considerations!”
  - Kyre 2 Mar 2013 3:47 UTC
    6 points
    Parent
    
    With perfect knowledge of all the first-person facts, Jane could not disregard the strong preference of the cow not to be harmed.
    
    Why not ?
    
    Even if it turns out that all humans would become cow-compassionate given ultimate knowledge, we are still interested in the rationality of cow-satan.
    - davidpearce 2 Mar 2013 5:10 UTC
      −4 points
      Parent
      Why not? Because Jane would weigh the preference of the cow not to have her throat slit as if it were her own. Of course, perfect knowledge of each other’s first-person states is still a pipedream. But let’s assume that in the future http://www.independent.co.uk/news/science/mindreading-rodents-scientists-show-telepathic-rats-can-communicate-using-braintobrain-8515259.html is ubiquitous, ensuring our mutual ignorance is cured.
      
      “The rationality of cow satan”? Apologies Kyre, you’ve lost me here. Could you possibly elaborate?
      - Kyre 3 Mar 2013 1:50 UTC
        6 points
        Parent
        What I’m saying is that cow-satan completely understands the preference of the cow not to have its throat slit. Every last grisly detail; all the physical, emotional, social, intellectual consequences, or consequences of any other kind. Cow satan has virtually experienced being slaughtered. Cow satan has studied the subject for centuries in detail. It is safe to say that no cow has ever understood the preference of cows not to be killed and eaten better than any cow ever could. Cow satan weighs that preference at zero.
        
        It might be the case that cow satan could not actually exist in our universe, but would you say that it is irrational for him to go ahead and have the burger ?
        
        (edit—thinking about it, that last question isn’t perhaps very helpful)
        
        Are you saying that perfect (or sufficiently good) mutual knowledge of each other’s experiences would be highly likely to change everyone’s preferences ? That might be the case, but I don’t see how that makes Jane’s burger choice irrational.
        davidpearce 3 Mar 2013 8:05 UTC
        0 points
        Parent
        Yes Kyre, “Cow Satan”, as far as I can tell, would be impossible. Imagine a full cognitive generalisation of http://www.livescience.com/1628-study-people-literally-feel-pain.html Why don’t mirror-touch synaesthetes—or full-spectrum superintelligences—wantonly harm each other?
        
        [this is not to discount the problem of Friendly AI. Alas one can imagine “narrow” superintelligences converting cows and humans alike into paperclips (or worse, dolorium) without insight into the first-person significance of what they are doing.]
        timtyler 11 Mar 2013 10:25 UTC
        1 point
        Parent
        
        “Cow Satan”, as far as I can tell, would be impossible.
        
        There isn’t too much that is impossible. In general, if we can imagine it, we can build it (because we have already built it—inside our brains).
        betterthanwell 11 Mar 2013 12:22 UTC
        0 points
        Parent
        
        There isn’t too much that is impossible. In general, if we can imagine it, we can build it (because we have already built it—inside our brains).
        
        Intuitive ideas are inconsistent upon reflection, with this fact conveniently glossed over by the brain, because the details simply aren’t there. The brain has to perform additional work, actually fill in the details, to notice inconsistencies.
        
        1: Imagine an invisible unicorn. 2: Carefully examine the properties of your invisible unicorn.
        
        Notice how those properties are being generated on the fly as you turn your attention to some aspect of the unicorn which requires a value for that property?
        davidpearce 11 Mar 2013 11:25 UTC
        0 points
        Parent
        Tim, in one sense I agree: In the words of William Ralph Inge, “We have enslaved the rest of the animal creation, and have treated our distant cousins in fur and feathers so badly that beyond doubt, if they were able to formulate a religion, they would depict the Devil in human form.”
        
        But I’m not convinced there could literally be a Cow Satan—for the same reason that there are no branches of Everett’s multiverse where any of the world’s religions are true, i.e. because of their disguised logical contractions. Unless you’re a fan of what philosophers call Meinong’s jungle (cf. http://en.wikipedia.org/wiki/Meinong’s_jungle), the existence of “Cow Satan” is impossible.