nostalgebraist comments on why assume AGIs will optimize for fixed goals?

nostalgebraist 10 Jun 2022 22:11 UTC
6 points
0
Sure. Although before I do, I want to qualify the quoted claim a bit.
When I say “our goals change over time,” I don’t mean “we behave something like EU maximizers with time-dependent utility functions.” I think we don’t behave like EU maximizers, in the sense of having some high-level preference function that all our behavior flows from in a top-down manner.
If we often make choices that are rational in a decision-theoretic sense (given some assumption about the preferences we are trying to satisfy), we are doing so via a “subgoal capacity.” This kind of decision-making is available to our outermost loop, and our outermost loop sometimes uses it.
But I don’t think the logic of our outermost loop actually is approximate EU maximization—as evidenced by all the differences between how we deploy our capabilities, and how a “smart EU maximizer” would deploy its own. For instance, IMO we are less power-seeking than an EU maximizer would be (and when humans do seek power, it’s often as an end in itself, whereas power-seeking is convergent across goals for EU maximizers).
Saying that “our goals change over time, and we permit/welcome this” is meant to gesture at how different we are from a hypothetical EU maximizer with our capabilities. But maybe this concedes too much, because it implies we have some things called “goals” that play a similar role to the EU maximizer’s utility function. I am pretty sure that’s not in fact true. We have “goals” in the informal sense, but I don’t really know what it is we do with them, and I don’t think mapping them on to the “preferences” in a decision-theoretic story about human behavior is likely to yield accurate predictions.
Anyway:
1. Some people grow up in cultures where meat eating is normal, and eat meat themselves until some point, then become convinced to stop out of moral concern for animals. In rare cases, the person might have “always felt on some level” the same moral concern that later drives them to veganism, but in the median case (it seems to me) this moral concern is actually acquired over time—the person really does not care as much about animals earlier as they do later. (This is a particular case of a more general thing, where one’s “moral circle” widens over time; various kinds of individual change toward more “progressive” attitudes fall in this category.)
2. When I was younger, I felt very strongly that I ought to “be moral” while also feeling very uncertain about what this entailed; I felt sure there was some “true ethics” (something like a single, objectively true ethical theory) which was important to follow, without knowing what it was, and when I tried to “be moral” this often involved trying to figure out what the true ethics was. Over time, I have lost this emotion towards “the true ethics” (I am not sure it exists and not sure the question matters), while gaining a tendency to have more moral emotions/motivations about concrete instances of harm. I am not sure how glad I am that this has occurred. I find my former self strange and even “silly,” but I have difficulty coming up with a fully satisfying argument that I am “right” while he is “wrong.”
3. When a person loses or gains religious faith, this changes their world model so drastically that their previous goals/values (or at least the things they consciously articulated as goals/values) are not even on the map anymore. If you think of yourself as fundamentally striving to “serve God,” and then you decide God doesn’t exist, you need a new thing to do.
4. Some people have a single, consuming passion that drives them for much/most/all of their adult life. For each such person, there was a time before they became passionate about this subject, and indeed a time before they had even heard of it. So it was not always their “goal.” Mathematicians seem like an especially clear case of this, since higher math is so remote from everyday life.
5. New parents sometimes report that their children provide them with a kind of value which they did not anticipate, or perhaps could not even have understood, before parenthood. And, the belief that this occurs is widespread enough that people sometimes have children, not because they want to on their current values, but because they expect to become different people with different values as a result, and wish this change to occur.
Sorry, you asked for three, that was five. I wanted to cover a range of areas, since one can look at any one of these on its own and imagine a way it might fit inside a decision-theoretic story.
EU maximization with a non-constant world model (“map”) might look like 3 and 4, while 5 involves a basic biological function of great interest to natural selection, so we might imagine it a hardwired “special case” not representative of how we usually work. But the ubiquity and variety of this kind of thing, together with our lack of power-seeking etc., does strike me a problem for decision-theoretic interpretations of human behavior at the lifetime level.
What links here?
- Broad Picture of Human Values by Thane Ruthenis (20 Aug 2022 19:42 UTC; 42 points)
- tailcalled 11 Jun 2022 8:16 UTC
  6 points
  0
  Parent
  Aha, this seems somewhat cruxy, because the things you list as examples of human goals are mostly about values, which I agree act in a sort of weird way, whereas I would see maintenance of homeostasis as a more central example of human goals. And while maintaining homeostasis isn’t 100% aligned with the concept of utility maximization, it does seem to be a lot more aligned than values.
  With respect to maintaining homeostasis, it can be a bit unclear what exactly the utility function is. The obvious possibility would be “homeostasis” or “survival” or something like that, but this is slightly iffy in two directions.
  First, because strictly speaking we maintain homeostasis based on certain proxies, so in a sense the proxies are what we more strictly optimize. But this can also be fit into the EU framework in another way, with the proxies representing part of the mechanism for how the expectation of utility is operationalised.
  And second, because maintaining homeostasis is again just a proxy for other goals that evolution has, namely because it grants power to engage in reproduction and kin altruism. And this doesn’t fit super neatly into classical EU frameworks, but it does fit neatly into later rationalist developments like outer/mesa-optimization.
  So basically, “homeostasis” is kind of fuzzy in how it relates to EU-style maximization, but it does also sort of fit, and I think it fits much better than values do:
  - A human has a goal of maintaining homeostasis
  - The goal is a fixed part of the human’s structure. The internal dynamics of the human, if left to their own devices, will never modify the goal.
  - The “outermost loop” of the human’s internal dynamics is an optimization process aimed at maintaining homeostasis, or at least the human behaves just as though this were true.
  - This “outermost loop” or “fixed-homeostasis-directed wrapper” chooses which of the human’s specific capabilities to deploy at any given time, and how to deploy it.
  - The human’s capabilities will themselves involve optimization for sub-goals that are not the same as maintaining homeostasis, and they will optimize for them very powerfully (hence “capabilities”). But it is “not enough” that the human merely be good at optimization-for-subgoals: it will also have a fixed-homeostasis-directed wrapper.
    So, the human may be very good at maintaining homeostasis, and when they are maintaining homeostasis, they may be running an internal routine that optimizes for maintaining homeostasis. This routine, and not the terminal-goal-directed wrapper around it, explains the human’s strong homeostasis. (“Maximize paperclips” does not tell you how to maintain homeostasis.)
    The human may also be good at things that are much more general than maintaining homeostasis, such as “planning,” “devising proofs in arbitrary formal systems,” “inferring human mental states,” or “coming up with parsimonious hypotheses to explain observations.” All of these are capacities to optimize for a particular subgoal that is not the human’s terminal goal.
    Although these subgoal-directed capabilities, and not the fixed-homeostasis-directed wrapper, will constitute the reason the human does well at anything it does well at, the human must still have a fixed-homeostasis-directed wrapper around them and apart from them. ADDED BECAUSE THE EXPLANATION WAS MISSING IN THE ORIGINAL OP: Because otherwise there is no principle to decide what subgoal to pursue in any one moment, and how to trade it off against other subgoals. Furthermore, otherwise you’d run into salt-starvation problems.
  - There is no way for the pursuit of homeostasis to change through bottom-up feedback from anything inside the wrapper. The hierarchy of control is strict and only goes one way.
  Homeostasis also fits the whole power-seeking/misalignment/xrisk dynamic much more closely than values do. Would humanity commit planet-scale killings and massively transform huge parts of the world for marginal values gain? Eh. Sort of. Sometimes. It’s complicated. Would humans commit planet-scale killings and massively transform huge parts of the world for marginal homeostasis gain? Obviously yes.
  What links here?
  - tailcalled's comment on wrapper-minds are the enemy by nostalgebraist (17 Jun 2022 8:59 UTC; 4 points)
  - PaulK 11 Jun 2022 20:36 UTC
    4 points
    0
    Parent
    >There is no way for the pursuit of homeostasis to change through bottom-up feedback from anything inside the wrapper. The hierarchy of control is strict and only goes one way.
    Note that people do sometimes do things like starve themselves to death or choose to become martyrs in various ways, for reasons that are very compelling to them. I take this as a demonstration that homeostatic maintenance of the body is in some sense “on the same level” as other reasons / intentions / values, rather than strictly above everything else.
    - tailcalled 11 Jun 2022 21:17 UTC
      2 points
      0
      Parent
      “No way” is indeed an excessively strong phrasing, but it seems clear to me that pursuit of homeostasis is much more robust to perturbations than most other pursuits.
      - PaulK 13 Jun 2022 4:52 UTC
        1 point
        0
        Parent
        I agree with that.
  - TekhneMakre 12 Jun 2022 3:08 UTC
    2 points
    0
    Parent
    I speculate that a significant chunk of Heidegger’s philosophy can be summarized as “people are homeostats for roles”. Everything we do (I’m speculating Heidegger claims) grounds out in “in order to be an X” where X is like “good person”, “father”, “professor”, “artist”, etc.
  - TAG 17 Jun 2022 12:01 UTC
    1 point
    0
    Parent
    This could really.do.with some concrete examples of homeostasis, and some discussion of how homeostasis is compatible with major life changes.
    
    And second, because maintaining homeostasis is again just a proxy for other goals that evolution has, namely because it grants power to engage in reproduction and kin altruism
    
    Staying alive grants power to engage in reproduction and kin altruism … But homeostasis means “staying the same”, according to the dictionary. The two come apart. A single homeostat would want to stay single, because that is their current state. But singletons often don’t want to stay single, and don’t serve evolutionary purposes by doing so.
    - tailcalled 17 Jun 2022 21:15 UTC
      2 points
      0
      Parent
      This could really.do.with some concrete examples of homeostasis,
      There’s a whole bunch of pure bio stuff that I’m not properly familiar with the details of, e.g. the immune system. But the more interesting stuff is probably the behavioral stuff.
      When you are low on nutrition or calories, you get hungry and try to eat. You generally feel motivated to ensure that you have food to eat for when you do get hungry, which in modern society involves doing stuff like working for pay, but in other societies has involved farming or foraging. If there are specific nutrients like salt that you are missing, then you feel strong cravings for food containing those nutrients.
      Shelter, if the weather is sufficiently cold that it would hurt you (or waste energy), then you find it aversive and seek protection. Longer-term, you ensure you have a house etc..
      Safety. If you touch a hot stove, you immediately move your hand away from it. In order to avoid getting hurt, people create safety regulations. Etc.
      There’s just tons of stuff, and nobody is likely to talk you out of ensuring that you get your nutritional needs covered, or to talk you out of being protected from the elements. They are very strong drives.
      and some discussion of how homeostasis is compatible with major life changes.
      And second, because maintaining homeostasis is again just a proxy for other goals that evolution has, namely because it grants power to engage in reproduction and kin altruism
      Staying alive grants power to engage in reproduction and kin altruism … But homeostasis means “staying the same”, according to the dictionary. The two come apart. A single homeostat would want to stay single, because that is their current state. But singletons often don’t want to stay single, and don’t serve evolutionary purposes by doing so.
      I mean homeostasis in the sense of “keeping conditions within the range where one is healthily alive”, not in the sense of “keeping everything the same”.
      - TAG 20 Jun 2022 16:28 UTC
        1 point
        0
        Parent
        
        I mean homeostasis in the sense of “keeping conditions within the range where one is healthily alive”, not in the sense of “keeping everything the same”.
        
        But that still isn’t the sole explanation of human behaviour, because humans do flourishing type stuff like going on dates, going to college, going to the gym.
        
        Also: global warming.
        tailcalled 20 Jun 2022 21:17 UTC
        2 points
        0
        Parent
        I’m not saying that it is the sole explanation of all human behavior, I’m saying that it is a major class of behaviors that is difficult to stop people from engaging in and which is necessary for the effectiveness of humans in influencing the world.
        Not sure what the relevance of global warming is to this discussion.
        TAG 21 Jun 2022 11:38 UTC
        1 point
        0
        Parent
        Global warming is humans moving conditions out of the range where one is healthily alive.
        tailcalled 21 Jun 2022 18:33 UTC
        2 points
        0
        Parent
        Insofar as that is true, seems like a scale issue. (It doesn’t seem entirely true—global warming is a major problem, but not exactly an x-risk. Many of the biggest contributors to global warming are not the ones who will be hit the hardest. And there’s a tragedy of the commons issue to it.)
      - TAG 17 Jun 2022 22:32 UTC
        1 point
        0
        Parent
        I’m familiar with the ordinary biological meaning of homeostasis , but I don’t see how it relates to
        
        Would humanity commit planet-scale killings and massively transform huge parts of the world for marginal values gain? Eh. Sort of. Sometimes. It’s complicated. Would humans commit planet-scale killings and massively transform huge parts of the world for marginal homeostasis gain? Obviously yes.
        
        Why would anyone commit massive crimes for marginal gains?
        tailcalled 18 Jun 2022 6:13 UTC
        2 points
        0
        Parent
        Factory farming
        TAG 18 Jun 2022 14:31 UTC
        1 point
        0
        Parent
        Why’s no one in jail for it?
        tailcalled 18 Jun 2022 15:08 UTC
        2 points
        0
        Parent
        It’s not illegal. I said “killings”, not “homicides” or “murders”.
  - PaulK 11 Jun 2022 20:47 UTC
    1 point
    0
    Parent
    As a slightly tangential point, I think if you start thinking about how to cast survival / homeostasis in terms of expected-utility maximization, you start having to confront a lot of funny issues, like, “what happens if my proxies for survival change because I self-modified?”, and then more fundamentally, “how do I define / locate the ‘me’ whose survival I am valuing? what if I overlap with other beings? what if there are multiple ‘copies’ of me?”. Which are real issues for selfhood IMO.
    - tailcalled 11 Jun 2022 21:17 UTC
      2 points
      0
      Parent
      In the evolutionary case, the answer is that this is out of distribution, so it’s not evolved to be robust to such changes.