Luk27182 comments on Alignment works both ways

Luk27182 17 Mar 2023 0:14 UTC
6 points
5
I didn’t want this change, it just happened.
I might be misunderstanding- isn’t this what the question was? Whether we should want (/be willing to) change our values?
Sometimes I felt like a fool afterward, having believed in stupid things
The problem with this is: If I change your value system in any direction, the hypnotized “you” will always believe that the intervention was positive. If I hypnotized you to believe that being carnivorous was more moral by changing your underlying value system to value animal suffering, then that version of you would view the current version of yourself as foolish and immoral.
There are essentially two different beings: carnivorous-Karl, and vegan-Karl. But only one of you can exist, since there is only one Karl-brain. If you are currently vegan-Karl, then you wish to remain vegan-Karl, since vegan-Karl’s existence means that your vegan values get to shape the world. Conversely, if you are currently carnivorous-Karl, then you wish to remain carnivorous-Karl for the same reasons.
Say I use hypnosis to change vegan-Karl into carnivorous-Karl. Then the resulting carnivorous-Karl would be happy he exists and view the previous version vegan-Karl as an immoral fool. Despite this, vegan-Karl still doesn’t want to become carnivorous-Karl- even though he knows that he would retrospectively endorse the decision if he made it!
- Karl von Wendt 17 Mar 2023 11:14 UTC
  2 points
  0
  Parent
  In principle, I agree with your logic: If I have value X, I don’t want to change that to Y. However, values like “veganism” are not isolated. It may be that I have a system of values [A...X], and changing X to Y would actually fit better with the other values, or more or less the same. Then I wouldn’t object that change. I may not be aware of this in advance, though. This is were learning comes into play: I may discover facts about the world that make me realize that Y fits better into my set of values than X. So vegan Karl may be a better fit to my other set of values than carnivorous Karl. In this way, the whole set of values may change over time, up to the point where they significantly differ from the original set (I feel like this happened to me in my life, and I think it is good).
  However, I realize that I’m not really good at arguing about this—I don’t have a fleshed-out “theory of values”. And that wasn’t really the point of my post. I just wanted to point out that our values may be changed by an AI, and that it may not necessarily be bad, but could also lead to an existential catastrophe—at least from today’s point of view.