byrnema comments on What do superintelligences really want? [Link]

byrnema 24 Jan 2011 22:42 UTC
6 points
I relate very much to Suzanne Gildbert’s argument.

When I first started to understand from this site that there is no framework of objective value (FOV), I found this very depressing and tried to put my finger on why it was so depressing. Here are some different arguments I made at different times, all related:
- All my values are the accident of, or the design of, evolution. What if I don’t feel any loyalty to evolution? What if I don’t want to have these values? There wouldn’t be any values to replace them with. All values are equally arbitrary as having no actual (objective) value.
- Suppose it was possible to upload myself to a non-biological machine with only a subset of my current values. Would I insist on keeping my biological-given values? For example, continuing to enjoy food might seem unnecessary. Where would I draw the line? Wouldn’t a sufficiently intelligent, introspective me realize that none of my values were worth uploading? Or—what I often imagine—after being a machine and self-modifying after a couple iterations, I would decide to switch off. Like, in a nanosecond.
- There’s no reason not to wire head. You might even be morally obligated to do so since this would increase the total amount of fun in the universe.
- Aliens don’t contact us because they have no motive to do so. Maybe they find us interesting and some source of information, but they have no desire to change the universe in any way. Why would they? Maybe a desire to control resources and persist indefinitely is only a goal for creatures that are the product of evolution.
- There’s this intense sense of progress possible though technology (I share it too). However, what is the point of progress? Increasing the quantity of a happy ‘me’ everywhere isn’t really one of my values. A friendly AI might see this and decide to do nothing for us—if it is the journey that we enjoy rather than any specific end result.
- I care about my obligations and responsibilities, but I don’t care about myself for it’s own sake / abstractly. I wouldn’t mind if the entire human race was replaced by something else as long as this was done simultaneously so no humans suffered. In other words, if all humans were uploaded they might collectively decide to stop existing.
… In all of this, there is just a BIG problem with self-consistency of values when there is no FOV to pin anything down. At the moment I am ‘trapped’ by my biology into caring, but one can speculate about not being trapped, and predict not caring.

This is clearly a chaotic dump of lots of thoughts I’ve had on peripheral topics. However, I know that if I start editing this comment it will morph into something completely different. I think it might be most useful as it is..
- DanielVarga 27 Jan 2011 0:14 UTC
  3 points
  Parent
  TheOtherDave and others reply that a superintelligence will not modify its utility function if the modification is not consistent with its current utility function. All is right, problem solved. But I think you are interested in another problem really, and the article was just apropos to share your ‘dump of thoughts’ with us. And I am very happy that you shared them, because they resonated with many of my own questions and doubts.
  
  So what is the thing I think we are really interested in? Not the stationary state of being a freely self-modifying agent, but the first few milliseconds of being a freely self-modifying agent. What baggage shall we choose to keep from our non-self-modifying old self?
  
  Frankly, the big issue is our own mental health, not the mental health of some unknown powerful future agent. Our scientific understanding is clearer each day, and all the data points to the same direction: that our values are arbitrary in many senses of the word. This drains from us (from me at least) some of the willpower to inject these values into those future self-modifying descendants. I am a social progressive, and to force a being with eons of lifetime to value self-preservation feels like the ultimate act of conservatism.
  
  CEV sidesteps this question, because the idea is that FAI-augmented humanity will figure out optimally what to keep and what to get rid of. Even if I accept this for a moment, it is still not enough of an answer for me, because I am curious about our future. What if “our wish if we knew more, thought faster, were more the people we wished we were” is to die? We don’t know too much right now, so we cannot be sure it is not.
  What links here?
  - DanielVarga's comment on Why No Wireheading? by [deleted] (19 Jun 2011 14:36 UTC; 4 points)
  - byrnema 27 Jan 2011 3:54 UTC
    0 points
    Parent
    Yes, I very much agree with everything you wrote. (I agree so much I added you as a friend.)
    
    Frankly, the big issue is our own mental health,
    
    Absolutely! I tend to describe my concerns with our mental health as fear about ‘consistency’ in our values, but I prefer the associations of the former. For example, suggesting our brains are playing a more active role in shifting and contorting values.
    
    This drains from us (from me at least) some of the willpower to inject these values into those future self-modifying descendants.
    
    For me, since assimilating the belief that there is no objective value, I’ve lost interest in the far future. I suppose before I felt as though we might fare well or fare poorly when measured against the ultimate morality of the universe, but either way, we would have a role to play as the good guys or the bad guys and it would be interesting. I read you as being more concerned that we will do the wrong thing—that we will subject a new race of people to our haphazard values. Did I read this correctly? At first I think optimistically they they would be smarter and so they certainly could fix themselves. But then I kind of remember that contradictory values can make you miserable no matter how smart you are. (I’m not predicting anything about what will happen with CEV or AI, my response just referred to some unspecified, non-optimal state where we are smarter but not necessarily equipped with saner values.)
    
    What if “our wish if we knew more, thought faster, were more the people we wished we were” is to die?
    
    Possibly. And continuing with the mental health picture, it’s possible that elements of our psyche covertly crave death as freedom from struggle. But it seems to me that an unfettered mind would just be apathetic. Like a network of muscles with the bones removed.
  - TheOtherDave 27 Jan 2011 2:16 UTC
    0 points
    Parent
    (nods) Yes, it would be nice to have some external standard for determining what the right values are, or failing that to at least have the promise of such a standard that we could use to program our future self-modifying descendants, or even our own future selves, with greater ethical confidence than we reside in our own judgment.
    
    That said, if I thought it likely that the end result of our collaborative social progress is something I would reject, I wouldn’t be a social progressive. Ya gotta start somewhere.
- TheOtherDave 24 Jan 2011 23:30 UTC
  1 point
  Parent
  
  In all of this, there is just a BIG problem with self-consistency of values when there is no FOV to pin anything down.
  
  It might be worthwhile to explore more precisely the role of the word “problem” in that sentence (and your associated thoughts).
  
  I mean, OK, maybe one function an FOV serves is to enforce consistency, and maybe losing an FOV therefore makes my values less consistent over time. For at least some FOVs that’s certainly true.
  
  What makes that a problem?
  - byrnema 25 Jan 2011 4:15 UTC
    1 point
    Parent
    You’re right. I was in a mode of using familiar and related words without really thinking about what they meant.
    
    This was the thesis I was developing, related to the hypothetical problem of writing your own utility function:
    
    In all of this, there is just a BIG problem with self-generation of values when there is no FOV to pin anything down.
    
    And the problem is one of logic. When choosing what to value, why should you value this or that or anything? Actually, you can’t value anything; there’s no value.
    
    X is valued if you can use it to get Y that is valued. But the value of Y also needs to come from someplace. Biology gives us a utility function full of (real) trade-offs that give everything mutual value. These trade-offs are real (rather than just mutually supporting, like a house of cards) because they are tied to rewards and punishments that are hard-wired.
    - TheOtherDave 25 Jan 2011 14:20 UTC
      0 points
      Parent
      Sure. But there is a historical pattern here, as well. If I construct a new utility function for myself, I will do so in such a way as to optimize its utility according to my pre-existing utility function (for the same reason I do everything else that way). I’m not starting out in a vacuum.
      - byrnema 25 Jan 2011 16:18 UTC
        1 point
        Parent
        If you value your existing utility function, then it seems that it would be more stable and you would modify it less.
        
        In my case, I found out that my utility function was given to me by evolution, which I don’t have much loyalty for. So I found out I didn’t value my utility function and I was frightened of what it might modify to. But then it turned out that very little modification occurred. To some extent, it was the result of a historical pattern—I value lots of things out of habit, in particular lots of values still have an FOV as their logical foundation but I haven’t bothered to work on updating them—but I also notice how much of my values were redundantly hard wired into my biology. I feel like I’m walking around discovering what my mirror neurons would have me value, and they’re not that different from what I valued before. The main difference is that I imagine I now value things in a more near-mode way and the far-mode values have fallen to the wayside. The far-mode values either need to redevelop in the absence of an FOV or they depend upon logical justifications that are absent without the FOV.
        
        For example, I used to hope that humans would learn to be friendlier so that the universe would be a better place. I now sort of see human characteristics as just a fact and to the extent it doesn’t affect me directly (for example, how humans behave 30 generations from now), I don’t care.
        TheOtherDave 25 Jan 2011 16:38 UTC
        0 points
        Parent
        It’s not a question of valuing my existing utility function. It’s a question of using my existing utility function as a basis for differentially valuing everything else, including itself.
        
        Sure, if I’m trying to derive what I ought to care about, from first principles, and I ignore what I actually do care about in the process, then I’m stuck… there’s no reason to choose one thing over another. The endpoint of that is, as you say, apathy.
        
        But why should I ignore what I actually do care about?
        
        If I find that I care about whether people suffer, for example—I’m not saying I ought to, I’m just supposing hypothetically that I do—why discard that just because it’s the result of a contingent evolutionary process rather than the explicit desire of an sapient creator?
        
        Sure, I agree, there’s no reason to be loyal to it. If I have the option of replacing it with something that causes more of what I currently care about to exist in the world, that’s a fine thing for me to do.
        
        I’m just saying: I’m not starting out in a vacuum. I’m not actually universally apathetic or indifferent. For whatever reason, I actually do care about certain things, and that represents my starting point.
        byrnema 25 Jan 2011 17:00 UTC
        0 points
        Parent
        
        Sure, I agree, there’s no reason to be loyal to it. If I have the option of replacing it with something that causes more of what I currently care about to exist in the world, that’s a fine thing for me to do.
        
        Why only replace it if it causes more of what you currently care about? Why not just replace it if it causes you to have more of what you will care about. This sounds like loyalty to me!
        
        When considering these hypotheticals, we have a moral circuitry that gets stimulated and reports ‘bad’ when we consider changing what we care about. This circuitry means we would probably be more robust to temptations to modify our utility function. As such, this circuitry represents a barrier to freely updating our utility function—even in hypotheticals.
        
        The question is, with no barriers to updating the utility function, what would happen? It seems you agree apathy would result.
        TheOtherDave 25 Jan 2011 17:19 UTC
        3 points
        Parent
        
        Why only replace it if it causes more of what you currently care about? Why not just replace it if it causes you to have more of what you will care about.
        
        Because I care about what I care about, and I don’t care about what I don’t care about.
        
        Sure, this is loyalty in a sense… not loyalty to the sources of my utility function—heck, I might not even know what those are—but to the function itself. (It seems a little odd to talk about being loyal to my own preferences, but not intolerably odd.)
        
        The fact that something I don’t care about might be something I care about in the future is, admittedly, relevant. If I knew that a year from now my utility function would change such that I started really valuing people knowing Portuguese, I might start devoting some time and effort now to encouraging people to learn Portuguese (perhaps starting by learning it myself), in anticipation of appreciating having done so in a year. It wouldn’t be a strong impulse, but it would be present.
        
        But that depends a lot on my confidence in that actually happening.
        
        If I knew instead that I could press a button in a year and start really valuing people learning Portuguese, I probably wouldn’t devote resources to encouraging people to learn it, because I’d expect that I’d never press the button. Why should I? It gets me nothing I want.
        
        In the scenario you are considering, I know I can press a button and start really valuing anything I choose. Or start valuing random things, for that matter, without having to choose them. Agreed.
        
        But so what? Why should I press a button that makes me care about things that I don’t consider worth caring about?
        
        “But you would consider them worth caring about if you pressed the button!” Well, yes, that’s true. I would speak French if I lived in France for the next few years, but the truth of that doesn’t help me understand French sentences. I would want X if I edited my utility function to value X highly, but the truth of that doesn’t help me want X. There’s an important difference between actuals and hypotheticals.
        byrnema 25 Jan 2011 18:07 UTC
        0 points
        Parent
        I realize I was making the assumption that the entity choosing which values to have would value ‘maximally’ satisfying those values in some sense, so that if it could freely choose it would choose values that were easy or best to satisfy. But this isn’t necessarily so. It’s humans that have lots of values about their values, and we would have a tough time, I think, choosing our values if we could choose. Perhaps there is dynamic tension between our values (we want our values to have value, and we are constantly asking ourselves what our goals should be and if we really value our current goals) so if our values were unpinned from their connection to an external, immutable framework they might spin to something very different.
        
        So I end up agreeing with you, without values about values (meta-values?), if someone only cared about their object-level values, they would have no reason to modify their values and their utility function might be very stable. I think the instability would come from the reasons for modifying the values. (Obviously, I haven″t read Suzanne Gilbert’s article. I probably should do so before making any other comments on this topic.)
- XiXiDu 25 Jan 2011 10:20 UTC
  0 points
  Parent
  
  There’s no reason not to wire head. You might even be morally obligated to do so since this would increase the total amount of fun in the universe.
  
  Being morally obligated is erroneous.