byrnema comments on What do superintelligences really want? [Link]

byrnema 25 Jan 2011 4:15 UTC
1 point
You’re right. I was in a mode of using familiar and related words without really thinking about what they meant.

This was the thesis I was developing, related to the hypothetical problem of writing your own utility function:

In all of this, there is just a BIG problem with self-generation of values when there is no FOV to pin anything down.

And the problem is one of logic. When choosing what to value, why should you value this or that or anything? Actually, you can’t value anything; there’s no value.

X is valued if you can use it to get Y that is valued. But the value of Y also needs to come from someplace. Biology gives us a utility function full of (real) trade-offs that give everything mutual value. These trade-offs are real (rather than just mutually supporting, like a house of cards) because they are tied to rewards and punishments that are hard-wired.
- TheOtherDave 25 Jan 2011 14:20 UTC
  0 points
  Parent
  Sure. But there is a historical pattern here, as well. If I construct a new utility function for myself, I will do so in such a way as to optimize its utility according to my pre-existing utility function (for the same reason I do everything else that way). I’m not starting out in a vacuum.
  - byrnema 25 Jan 2011 16:18 UTC
    1 point
    Parent
    If you value your existing utility function, then it seems that it would be more stable and you would modify it less.
    
    In my case, I found out that my utility function was given to me by evolution, which I don’t have much loyalty for. So I found out I didn’t value my utility function and I was frightened of what it might modify to. But then it turned out that very little modification occurred. To some extent, it was the result of a historical pattern—I value lots of things out of habit, in particular lots of values still have an FOV as their logical foundation but I haven’t bothered to work on updating them—but I also notice how much of my values were redundantly hard wired into my biology. I feel like I’m walking around discovering what my mirror neurons would have me value, and they’re not that different from what I valued before. The main difference is that I imagine I now value things in a more near-mode way and the far-mode values have fallen to the wayside. The far-mode values either need to redevelop in the absence of an FOV or they depend upon logical justifications that are absent without the FOV.
    
    For example, I used to hope that humans would learn to be friendlier so that the universe would be a better place. I now sort of see human characteristics as just a fact and to the extent it doesn’t affect me directly (for example, how humans behave 30 generations from now), I don’t care.
    - TheOtherDave 25 Jan 2011 16:38 UTC
      0 points
      Parent
      It’s not a question of valuing my existing utility function. It’s a question of using my existing utility function as a basis for differentially valuing everything else, including itself.
      
      Sure, if I’m trying to derive what I ought to care about, from first principles, and I ignore what I actually do care about in the process, then I’m stuck… there’s no reason to choose one thing over another. The endpoint of that is, as you say, apathy.
      
      But why should I ignore what I actually do care about?
      
      If I find that I care about whether people suffer, for example—I’m not saying I ought to, I’m just supposing hypothetically that I do—why discard that just because it’s the result of a contingent evolutionary process rather than the explicit desire of an sapient creator?
      
      Sure, I agree, there’s no reason to be loyal to it. If I have the option of replacing it with something that causes more of what I currently care about to exist in the world, that’s a fine thing for me to do.
      
      I’m just saying: I’m not starting out in a vacuum. I’m not actually universally apathetic or indifferent. For whatever reason, I actually do care about certain things, and that represents my starting point.
      - byrnema 25 Jan 2011 17:00 UTC
        0 points
        Parent
        
        Sure, I agree, there’s no reason to be loyal to it. If I have the option of replacing it with something that causes more of what I currently care about to exist in the world, that’s a fine thing for me to do.
        
        Why only replace it if it causes more of what you currently care about? Why not just replace it if it causes you to have more of what you will care about. This sounds like loyalty to me!
        
        When considering these hypotheticals, we have a moral circuitry that gets stimulated and reports ‘bad’ when we consider changing what we care about. This circuitry means we would probably be more robust to temptations to modify our utility function. As such, this circuitry represents a barrier to freely updating our utility function—even in hypotheticals.
        
        The question is, with no barriers to updating the utility function, what would happen? It seems you agree apathy would result.
        TheOtherDave 25 Jan 2011 17:19 UTC
        3 points
        Parent
        
        Why only replace it if it causes more of what you currently care about? Why not just replace it if it causes you to have more of what you will care about.
        
        Because I care about what I care about, and I don’t care about what I don’t care about.
        
        Sure, this is loyalty in a sense… not loyalty to the sources of my utility function—heck, I might not even know what those are—but to the function itself. (It seems a little odd to talk about being loyal to my own preferences, but not intolerably odd.)
        
        The fact that something I don’t care about might be something I care about in the future is, admittedly, relevant. If I knew that a year from now my utility function would change such that I started really valuing people knowing Portuguese, I might start devoting some time and effort now to encouraging people to learn Portuguese (perhaps starting by learning it myself), in anticipation of appreciating having done so in a year. It wouldn’t be a strong impulse, but it would be present.
        
        But that depends a lot on my confidence in that actually happening.
        
        If I knew instead that I could press a button in a year and start really valuing people learning Portuguese, I probably wouldn’t devote resources to encouraging people to learn it, because I’d expect that I’d never press the button. Why should I? It gets me nothing I want.
        
        In the scenario you are considering, I know I can press a button and start really valuing anything I choose. Or start valuing random things, for that matter, without having to choose them. Agreed.
        
        But so what? Why should I press a button that makes me care about things that I don’t consider worth caring about?
        
        “But you would consider them worth caring about if you pressed the button!” Well, yes, that’s true. I would speak French if I lived in France for the next few years, but the truth of that doesn’t help me understand French sentences. I would want X if I edited my utility function to value X highly, but the truth of that doesn’t help me want X. There’s an important difference between actuals and hypotheticals.
        byrnema 25 Jan 2011 18:07 UTC
        0 points
        Parent
        I realize I was making the assumption that the entity choosing which values to have would value ‘maximally’ satisfying those values in some sense, so that if it could freely choose it would choose values that were easy or best to satisfy. But this isn’t necessarily so. It’s humans that have lots of values about their values, and we would have a tough time, I think, choosing our values if we could choose. Perhaps there is dynamic tension between our values (we want our values to have value, and we are constantly asking ourselves what our goals should be and if we really value our current goals) so if our values were unpinned from their connection to an external, immutable framework they might spin to something very different.
        
        So I end up agreeing with you, without values about values (meta-values?), if someone only cared about their object-level values, they would have no reason to modify their values and their utility function might be very stable. I think the instability would come from the reasons for modifying the values. (Obviously, I haven″t read Suzanne Gilbert’s article. I probably should do so before making any other comments on this topic.)