Kaj_Sotala comments on The two-layer model of human values, and problems with synthesizing preferences

Kaj_Sotala 27 Jan 2020 17:07 UTC
2 points
Hmm… it’s hard for me to get what you mean from a comment this short, but just the fact that I seem to have a lot of difficulty connecting your comment with my own model suggests that I didn’t communicate mine very well. Could you say more about how you understood it?
- Stuart_Armstrong 28 Jan 2020 15:49 UTC
  2 points
  Parent
  The player seems to value emotional states, while the character values specific situations it can describe? Does that seem right?
  - Charlie Steiner 29 Jan 2020 8:14 UTC
    6 points
    Parent
    My take is that we (the characters) have some wireheadable goals (e.g. curing a headache), but we also have plenty of goals best understood externally.
    
    But the “player” is a less clearly goal-oriented process, and we can project different sorts of goals onto it, ranging from “it wants to make the feedback signal from the cortical neurons predict the output of some simple pattern detector” to “it wants us to avoid spiders” to “it wants us to be reproductively fit.”
  - Kaj_Sotala 29 Jan 2020 12:04 UTC
    5 points
    Parent
    Hmm… several thoughts about that.
    One is that I don’t think we really know what the player does value. I had some guesses and hand-waving in the post, but nothing that I would feel confident enough about to use as the basis for preference synthesis or anything like that. I’m not even certain that our values can be very cleanly split into a separate character and player, though I do think that the two-layer model is less wrong than the naive alternative.
    In Sarah’s original analogy, the player first creates the character; then the character acts based on the choices that the player has made beforehand. But I should have mentioned in the post that one aspect in which I think the analogy is wrong, is that the player keeps changing the character. (Maybe you could think of this as one of those games that give you the option to take back the experience points that you’ve used on your character and then lets you re-assign them...)
    Part of normal learning and change is that when you have new experiences, the learning process which I’ve been calling the player is involved in determining how those experiences affect your desires and personality. E.g. the changes in values and preferences that many people experience after having their first child—that might be described as the work of the player writing the “parental values” attribute into the character sheet. Or someone who goes to college, uncertain of what they want to study, tries out a few different subjects, and then switches their major to something which they found surprisingly interesting and motivating—the player giving them a preference to study that thing.
    Those examples seem complicated enough that it seems a little too simplified to say that the player values emotional states; to some extent it seems to, but it also seems to itself create emotional states as suit its purposes. Probably what it “values” can’t be simplified into any brief verbal description; it’s more like it’s a godshatter with a thousand different optimization criteria, all being juggled together to create something like the character.
    I read your original comment as suggesting that we give the player sufficient pleasure that it is content; and then we also satisfy the character’s preferences. But
    1. Assuming for the sake of argument that this was possible, it’s not clear what “the player being content” would do to a person’s development. One possibility is that they would stop growing and responding to changed circumstances at all, because the mechanisms that were updating their behavior and thinking were all in the player. (Maybe even up to the point of e.g. not developing new habits in response to having moved to a new home with different arrangements, or something similar.)
    2. There’s anecdotal evidence suggesting that the pursuit of pleasure is actually also one of those character-level things. In “Happiness is a chore”, the author makes the claim that even if you give people a technique which would consistently make them happy, and people try it out and become convinced of this, they might still end up not using it—because although “the pursuit of happiness” is what the character thinks they are doing, it is actually not what the player is optimizing for. If it was, it might be in the player’s power to just create the happiness directly. Compare e.g. pjeby’s suggestion that things like happiness etc. are things that we feel by default, but the brain learns to activate systems which block happiness, because the player considers that necessary for some purpose:
    So if, for example, we don’t see ourselves as worthless, then experiencing ourselves as “being” or love or okayness is a natural, automatic consequence. Thus I ended up pursing methods that let us switch off the negatives and deal directly with what CT and IFS represent as objecting parts, since these objections are the constraint on us accessing CT’s “core states” or IFS’s self-leadership and self-compassion.
    These claims also match my personal experience; I have, at various times, found techniques that I know would make me happy, but then I find myself just not using them. At one point I wrote “I have available to me some mental motions for reaching inside myself and accessing a source of happiness, but it would require a bit of an active effort, and I find that just being neutral is already good enough, so I can’t be bothered.” Ironically, I think I’ve forgotten what exactly that particular mental move was, because I ended up not using it very much...
    There’s also a thing in meditative traditions where people develop the ability to access some really pleasant states of mind (“jhanas”). But then, although some people do become “jhana-junkies” and mostly just want to hang out in them, a lot of folks don’t. One friend of mine who knows how to access the jhanas was once asked something along the lines of “well, if you can access pure pleasure, why aren’t you doing it all the time”. That got him thoughtful, and then he afterwards mentioned something about pure bliss just getting kinda stale / boring after a while. Also, getting into a jhana requires some amount of effort and energy, and he figures that he might as well spend that effort and energy on something more meaningful than just pure pleasure.
    3. “Not getting satisfied” seems like a characteristic thing of the player. The character thinks that they might get satisfied: “once I have that job that I’ve always wanted, then I’ll be truly happy”… and then after a while they aren’t anymore. If we model people’s goals as setpoints, it seems like frequently when one setpoint has been reached (which the previous character would have been satisfied with), the player looks around and changes the character to give it a new target setpoint. (I saw speculation somewhere that this is an evolutionary hack for getting around the fact that the brain has only a limited range of utility that it can represent—by redefining the utility scale whenever you reach a certain threshold, you can effectively have an unbounded utility function even though your brain can only represent bounded utility. Of course, it comes with costs such as temporally inconsistent preferences.)
    - Stuart_Armstrong 29 Jan 2020 16:21 UTC
      4 points
      Parent
      Interesting. I will think more...