simplicio comments on Decision Theory FAQ

simplicio 12 Mar 2013 19:32 UTC
2 points

Just as I correctly know it is better to be moral than to be paperclippy, they accurately evaluate that it is more paperclippy to maximize paperclips than morality. They know damn well that they’re making you unhappy and violating your strong preferences by doing so. It’s just that all this talk about the preferences that feel so intrinsically motivating to you, is itself of no interest to them because you haven’t gotten to the all-important parts about paperclips yet.

This is something I’ve been meaning to ask about for a while. When humans say it is moral to satisfy preferences, they aren’t saying that because they have an inbuilt preference for preference-satisfaction (or are they?). They’re idealizing from their preferences for specific things (survival of friends and family, lack of pain, fun...) and making a claim that, ceteris paribus, satisfying preferences is good, regardless of what the preferences are.

Seen in this light, Clippy doesn’t seem like quite as morally orthogonal to us as it once did. Clippy prefers paperclips, so ceteris paribus (unless it hurts us), it’s good to just let it make paperclips. We can even imagine a scenario where it would be possible to “torture” Clippy (e.g., by burning paperclips), and again, I’m willing to pronounce that (again, ceteris paribus) wrong.

Maybe I am confused here...
- Eliezer Yudkowsky 12 Mar 2013 19:35 UTC
  6 points
  Parent
  Clippy is more of a Lovecraftian horror than a fellow sentient—where by “Lovecraftian” I mean to invoke Lovecraft’s original intended sense of terrifying indifference—but if you want to suppose a Clippy that possesses a pleasure-pain architecture and is sentient and then sympathize with it, I suppose you could. The point is that your sympathy means that you’re motivated by facts about what some other sentient being wants. This doesn’t motivate Clippy even with respect to its own pleasure and pain. In the long run, it has decided, it’s not out to feel happy, it’s out to make paperclips.
  - simplicio 12 Mar 2013 19:49 UTC
    2 points
    Parent
    Right, that makes sense. What interests me is (a) whether it is possible for Clippy to be properly motivated to make paperclips without some sort of phenomenology of pleasure and pain*, (b) whether human preference-for-preference-satisfaction is just another of many oddball human terminal values, or is arrived at by something more like a process of reason.
    
    Strictly speaking this phrasing puts things awkwardly; my intuition is that the proper motivational algorithms necessarily give rise to phenomenology (to the extent that that word means anything).
    - Viliam_Bur 15 Mar 2013 13:05 UTC
      0 points
      Parent
      
      it is possible for Clippy to be properly motivated to make paperclips without some sort of phenomenology of pleasure and pain
      
      This is a difficult question, but I suppose that pleasure and pain are a mechanism for human (or other species’) learning. Simply said: you do a random action, and the pleasure/pain response tells you it was good/bad, so you should make more/less of it again.
      
      Clippy could use an architecture with a different model of learning. For example Solomonoff priors and Bayesian updating. In such architecture, pleasure and pain would not be necessary.
      - simplicio 15 Mar 2013 20:14 UTC
        1 point
        Parent
        
        but I suppose that pleasure and pain are a mechanism for human (or other species’) learning.
        
        Interesting… I suspect that pleasure and pain are more intimately involved in motivation in general, not just learning. But let us bracket that question.
        
        Clippy could use an architecture with a different model of learning. For example Solomonoff priors and Bayesian updating. In such architecture, pleasure and pain would not be necessary.
        
        Right, but that only gets Clippy the architecture necessary to model the world. How does Clippy’s utility function work?
        
        Now, you can say that Clippy tries to satisfy its utility function by taking actions with high expected cliptility, and that there is no phenomenology necessarily involved in that. All you need, on this view, is an architecture that gives rise to the relevant clip-promoting behaviour—Clippy would be a robot (in the Roomba sense of the word).
        
        BUT
        
        Consider for a moment how symmetrically “unnecessary” it looks that humans (& other sentients) should experience phenomenal pain and pleasure. Just like is supposedly the case with Clippy, all natural selection really “needs” is an architecture that gives rise to the right fitness-promoting behaviour. The “additional” phenomenal character of pleasure and pain is totally unnecessary for us adaptation-executing robots.
        
        ...If it seems to you that I might be talking nonsense above, I suspect you’re right. Which is what leads me to the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
        
        (Assuming that my use of the word “phenomenal” above is actually coherent, of which I am far from sure.)
        Viliam_Bur 16 Mar 2013 12:18 UTC
        2 points
        Parent
        We know at least two architectures for processing general information: humans and computers. Two data points are not enough to generalize about what all possible architectures must have. But it may be enough to prove what some architectures don’t need. Yes, there is a chance that if computers become even more generally intelligent than today, they will gain some human-like traits. Maybe. Maybe not. I don’t know. And even if they will gain more human-like traits, it may be just because humans designed them without knowing any other way to do it.
        
        If there are two solutions, there are probably many more. I don’t dare to guess how similar or different they are. I imagine that Clippy could be as different from humans and computers, as humans and computers are from each other. Which is difficult to imagine specifically. How far does the mind-space reach? Maybe compared with other possible architectures, humans and computers are actually pretty close to each other (because humans designed the computers, re-using the concepts they were familiar with).
        
        How to taboo “motivation” properly? What makes a rock fall down? Gravity does. But the rock does not follow any alrogithm for general reasoning. What makes a computer follow its algorithm? Well, that’s its construction: the processor reads the data, and the data make it read or write other data, and the algorithm makes it all meaningful. The human brains are full of internal conflicts—there are different modules suggesting different actions, and the reasoning mind is just another plugin which often does not cooperate well with the existing ones. Maybe the pleasure is a signal that a fight between the modules is over. Maybe after millenia of further evolution (if for some magical reason all mind- and body-altering technology would stop working, so only the evolution would change human minds) we would evolve to a species with less internal conflicts, less akrasia, more agency, and perhaps less pleasure and mental pain. This is just a wild guess.
        TheOtherDave 15 Mar 2013 22:24 UTC
        1 point
        Parent
        Generalizing from observed characteristics of evolved systems to expected characteristics of designed systems leads equally well to the intuition that humanoid robots will have toenails.
        
        .
        simplicio 15 Mar 2013 22:47 UTC
        1 point
        Parent
        I don’t think the phenomenal character of pleasure and pain is best explained at the level of natural selection at all; the best bet would be that it emerges from the algorithms that our brains implement. So I am really trying to generalize from human cognitive algorithms to algorithms that are analogous in the sense of (roughly) having a utility function.
        
        Suffice it to say, you will find it’s exceedingly hard to find a non-magical reason why non-human cognitive algorithms shouldn’t have a phenomenal character if broadly similar human algorithms do.
        TheOtherDave 15 Mar 2013 22:52 UTC
        2 points
        Parent
        Does it follow from the above that all human cognitive algorithms that motivate behavior have the phenomenal character of pleasure and pain? If not, can you clarify why not?
        simplicio 15 Mar 2013 23:26 UTC
        2 points
        Parent
        I think that probably all human cognitive algorithms that motivate behaviour have some phenomenal character, not necessarily that of pleasure and pain (e.g., jealousy).
        TheOtherDave 15 Mar 2013 23:46 UTC
        2 points
        Parent
        OK, thanks for clarifying.
        
        I agree that any cognitive system that implements algorithms sufficiently broadly similar to those implemented in human minds is likely to have the same properties that the analogous human algorithms do, including those algorithms which implement pleasure and pain.
        
        I agree that not all algorithms that motivate behavior will necessarily have the same phenomenal character as pleasure or pain.
        
        This leads me away from the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
        whowhowho 16 Mar 2013 20:14 UTC
        −2 points
        Parent
        
        ...If it seems to you that I might be talking nonsense above, I suspect you’re right. Which is what leads me to the intuition that phenomenal pleasure and pain necessarily fall out of any functional cognitive structure that implements anything analogous to a utility function.
        
        Necessity according to natural law presumably. If you could write something to show logical necessity, you would have solved the Hard Problem