Luk27182 comments on Alignment works both ways

Luk27182 7 Mar 2023 22:58 UTC
1 point
1
If I were convinced to value things, I would no longer be myself. Changing values is suicide.
You might somehow convince me through hypnosis that eating babies is actually kind of fun, and after that, that-which-inhabits-my-body would enjoy eating babies. However, that being would no longer be me. I’m not sure what a necessary and sufficient condition is for recognizing another version of myself, but sharing values is at least part of the necessary condition.
- Karl von Wendt 8 Mar 2023 7:03 UTC
  5 points
  2
  Parent
  Changing values is suicide.
  This seems quite drastic. For example, my son convinced me to become an (almost) vegan, because I realized that the way we treat animals isn’t right and I don’t want to add to their suffering. This certainly changed my value system, as well as my diet.
  Changing values means changing yourself, but change is not death, otherwise we wouldn’t survive the first convincing LessWrong post. :) Of course, there are changes to the better and changes to the worse. The whole problem seems to be to differentiate between both.
  - Luk27182 15 Mar 2023 19:12 UTC
    6 points
    5
    Parent
    My language was admittedly overly dramatic, but I don’t think it make rational sense to want to change your values for the sake of just having the new value. If I wanted to value something, then by definition I would already value that thing. That said, I might not take actions based on that value if:
    There was social/economic pressure not to do so
    I already had the habit of acting a different way
    I didn’t realize I was acting against my value
    etc.
    I think that actions like becoming vegan are more like overcoming the above points than fundamentally changing your values. Presumably, you already valued things like “the absence of death and suffering” before becoming vegan.
    Changing opinions on topics and habits isn’t the same as changing my underlying values- reading LessWrong/EA hasn’t changed my values. I valued human life and the absence of suffering before reading EA posts, for example.
    If I anticipated that reading a blogpost would change my values, I would not read it. I can’t see a difference between reading a blog post convincing me that “eating babies isn’t actually that wrong,” and being hypnotized to believe the same. Just because I am convinced of something doesn’t mean that the present version of me is smarter/more moral than the previous version.
    I think the key point of the question:
    1) If, for some reason, we all truly wanted to be turned into paperclips (or otherwise willingly destroy our future), would that be a bad thing? If so, why?
    Is the word “bad.” I don’t think there is an inherent moral scale at the center of physics
    “There is no justice in the laws of nature, no term for fairness in the equations of motion. The Universe is neither evil, nor good, it simply does not care. The stars don’t care, or the Sun, or the sky.
    
    But they don’t have to! WE care! There IS light in the world, and it is US!”
    (HPMoR)
    The word “bad” just corresponds to what we think is bad. And by definition of “values”, we want our values to be fulfilled. We (in the present) don’t want a world where we are all turned in to paperclips, so we (in the present) would classify a world in which everything is paperclips is “bad”- even if the future brainwashed versions of our selves disagree.
    - Karl von Wendt 16 Mar 2023 10:21 UTC
      2 points
      0
      Parent
      I guess it depends on how you define “value”. I have definitely changed my stance towards many things in my lifetime, not because I was under social pressure or unaware of it before, but because I changed my mind. I didn’t want this change, it just happened, because someone convinced me, or because I spent more time thinking about things, or because of reading a book, etc. Sometimes I felt like a fool afterward, having believed in stupid things. If you reduce the term “value” to “doing good things”, then maybe it hasn’t changed. But what “good things” means did change a lot for me, and I don’t see this as a bad thing.
      - Luk27182 17 Mar 2023 0:14 UTC
        6 points
        5
        Parent
        I didn’t want this change, it just happened.
        I might be misunderstanding- isn’t this what the question was? Whether we should want (/be willing to) change our values?
        Sometimes I felt like a fool afterward, having believed in stupid things
        The problem with this is: If I change your value system in any direction, the hypnotized “you” will always believe that the intervention was positive. If I hypnotized you to believe that being carnivorous was more moral by changing your underlying value system to value animal suffering, then that version of you would view the current version of yourself as foolish and immoral.
        There are essentially two different beings: carnivorous-Karl, and vegan-Karl. But only one of you can exist, since there is only one Karl-brain. If you are currently vegan-Karl, then you wish to remain vegan-Karl, since vegan-Karl’s existence means that your vegan values get to shape the world. Conversely, if you are currently carnivorous-Karl, then you wish to remain carnivorous-Karl for the same reasons.
        Say I use hypnosis to change vegan-Karl into carnivorous-Karl. Then the resulting carnivorous-Karl would be happy he exists and view the previous version vegan-Karl as an immoral fool. Despite this, vegan-Karl still doesn’t want to become carnivorous-Karl- even though he knows that he would retrospectively endorse the decision if he made it!
        Karl von Wendt 17 Mar 2023 11:14 UTC
        2 points
        0
        Parent
        In principle, I agree with your logic: If I have value X, I don’t want to change that to Y. However, values like “veganism” are not isolated. It may be that I have a system of values [A...X], and changing X to Y would actually fit better with the other values, or more or less the same. Then I wouldn’t object that change. I may not be aware of this in advance, though. This is were learning comes into play: I may discover facts about the world that make me realize that Y fits better into my set of values than X. So vegan Karl may be a better fit to my other set of values than carnivorous Karl. In this way, the whole set of values may change over time, up to the point where they significantly differ from the original set (I feel like this happened to me in my life, and I think it is good).
        However, I realize that I’m not really good at arguing about this—I don’t have a fleshed-out “theory of values”. And that wasn’t really the point of my post. I just wanted to point out that our values may be changed by an AI, and that it may not necessarily be bad, but could also lead to an existential catastrophe—at least from today’s point of view.
  - baturinsky 8 Mar 2023 17:50 UTC
    1 point
    1
    Parent
    Looks more like your value of avoiding killing living beings was stronger than your value of eating tasty meat.
    I.e. you didn’t change your fundamental values, only “instrumental” ones.
- baturinsky 8 Mar 2023 17:47 UTC
  1 point
  0
  Parent
  The value of not dying, the value of not changing values and the value of amassing power are not mandatory.
  It’s just values that are favored by the natural selection.
  Unless we have a system that switches AI off if it has those values.