TheOtherDave comments on What do superintelligences really want? [Link]

TheOtherDave 25 Jan 2011 16:38 UTC
0 points
It’s not a question of valuing my existing utility function. It’s a question of using my existing utility function as a basis for differentially valuing everything else, including itself.

Sure, if I’m trying to derive what I ought to care about, from first principles, and I ignore what I actually do care about in the process, then I’m stuck… there’s no reason to choose one thing over another. The endpoint of that is, as you say, apathy.

But why should I ignore what I actually do care about?

If I find that I care about whether people suffer, for example—I’m not saying I ought to, I’m just supposing hypothetically that I do—why discard that just because it’s the result of a contingent evolutionary process rather than the explicit desire of an sapient creator?

Sure, I agree, there’s no reason to be loyal to it. If I have the option of replacing it with something that causes more of what I currently care about to exist in the world, that’s a fine thing for me to do.

I’m just saying: I’m not starting out in a vacuum. I’m not actually universally apathetic or indifferent. For whatever reason, I actually do care about certain things, and that represents my starting point.
- byrnema 25 Jan 2011 17:00 UTC
  0 points
  Parent
  
  Sure, I agree, there’s no reason to be loyal to it. If I have the option of replacing it with something that causes more of what I currently care about to exist in the world, that’s a fine thing for me to do.
  
  Why only replace it if it causes more of what you currently care about? Why not just replace it if it causes you to have more of what you will care about. This sounds like loyalty to me!
  
  When considering these hypotheticals, we have a moral circuitry that gets stimulated and reports ‘bad’ when we consider changing what we care about. This circuitry means we would probably be more robust to temptations to modify our utility function. As such, this circuitry represents a barrier to freely updating our utility function—even in hypotheticals.
  
  The question is, with no barriers to updating the utility function, what would happen? It seems you agree apathy would result.
  - TheOtherDave 25 Jan 2011 17:19 UTC
    3 points
    Parent
    
    Why only replace it if it causes more of what you currently care about? Why not just replace it if it causes you to have more of what you will care about.
    
    Because I care about what I care about, and I don’t care about what I don’t care about.
    
    Sure, this is loyalty in a sense… not loyalty to the sources of my utility function—heck, I might not even know what those are—but to the function itself. (It seems a little odd to talk about being loyal to my own preferences, but not intolerably odd.)
    
    The fact that something I don’t care about might be something I care about in the future is, admittedly, relevant. If I knew that a year from now my utility function would change such that I started really valuing people knowing Portuguese, I might start devoting some time and effort now to encouraging people to learn Portuguese (perhaps starting by learning it myself), in anticipation of appreciating having done so in a year. It wouldn’t be a strong impulse, but it would be present.
    
    But that depends a lot on my confidence in that actually happening.
    
    If I knew instead that I could press a button in a year and start really valuing people learning Portuguese, I probably wouldn’t devote resources to encouraging people to learn it, because I’d expect that I’d never press the button. Why should I? It gets me nothing I want.
    
    In the scenario you are considering, I know I can press a button and start really valuing anything I choose. Or start valuing random things, for that matter, without having to choose them. Agreed.
    
    But so what? Why should I press a button that makes me care about things that I don’t consider worth caring about?
    
    “But you would consider them worth caring about if you pressed the button!” Well, yes, that’s true. I would speak French if I lived in France for the next few years, but the truth of that doesn’t help me understand French sentences. I would want X if I edited my utility function to value X highly, but the truth of that doesn’t help me want X. There’s an important difference between actuals and hypotheticals.
    - byrnema 25 Jan 2011 18:07 UTC
      0 points
      Parent
      I realize I was making the assumption that the entity choosing which values to have would value ‘maximally’ satisfying those values in some sense, so that if it could freely choose it would choose values that were easy or best to satisfy. But this isn’t necessarily so. It’s humans that have lots of values about their values, and we would have a tough time, I think, choosing our values if we could choose. Perhaps there is dynamic tension between our values (we want our values to have value, and we are constantly asking ourselves what our goals should be and if we really value our current goals) so if our values were unpinned from their connection to an external, immutable framework they might spin to something very different.
      
      So I end up agreeing with you, without values about values (meta-values?), if someone only cared about their object-level values, they would have no reason to modify their values and their utility function might be very stable. I think the instability would come from the reasons for modifying the values. (Obviously, I haven″t read Suzanne Gilbert’s article. I probably should do so before making any other comments on this topic.)