I think the problem here is distinguishing between terminal and instrumental goals? Most of people probably don’t run apple pie business because they have terminal goals about apple pie business. They probably want money, status, want to be useful and provide for their families and I expect this goals to be very persistent and self-preseving.
Not all such goals have to be instrumental to terminal goals, and in humans the line between instrumental and noninstrumental is not clear. Like at one extreme the instrumental goal is explicitly created by thinking about what would increase money/status, but at another “instrumental” goal is a shard reinforced by a money/status drive which would not change as the money/status drive changes.
Also even if the goal of selling apple pies is entirely instrumental, it’s still interesting that the goal can be dissolved once it’s no longer compatible with the terminal goal of e.g. gaining money. This means that not all goals are dangerously self-preserving.
Yes, exactly. Like, we humans mostly have something that kinda feels intrinsic but that also pays rent and updates with experience, like a Go player’s sense of “elegant” go moves. My current (not confident) guess is that these thingies (that humans mostly have) might be a more basic and likely-to-pop-up-in-AI mathematical structure than are fixed utility functions + updatey beliefs, a la Bayes and VNM. I wish I knew a simple math for them.
I feel like… no, it is not very interesting, it seems pretty trivial? We (agents) have goals, we have relationships between them, like “priorities”, we sometimes abandon goals with low priority in favor of goals with higher priorities. We also can have meta-goals like “how should my systems of goals look like” and “how to abandon and adopt intermediate goals in a reasonable way” and “how to do reflection on goals” and future superintelligent systems probably will have something like that. All of this seems to me coming in package with concept of “goal”.
My goals for money, social status, and even how much I care about my family don’t seem all that stable and have changed a bunch over time. They seem to be arising from some deeper combination of desires to be accepted, to have security, to feel good about myself, to avoid effortful work etc. interacting with my environment. Yet I wouldn’t think of myself as primarily pursuing those deeper desires, and during various periods would have self-modified if given the option to more aggressively pursue the goals that I (the “I” that was steering things) thought I cared about (like doing really well at a specific skill, which turned out to be a fleeting goal with time).
What about things like fun, happiness, eudamonia, meaning?
I certainly think that excluding brain damage/very advanced brainwashing, you are not going to eat babies or turn planets into paperclips.
Thanks for replying. The thing I’m wondering about is: maybe it’s sort of like this “all the way down.” Like, maybe the things that are showing up as “terminal” goals in your analysis (money, status, being useful) are themselves composed sort of like the apple pie business, in that they congeal while they’re “profitable” from the perspective of some smaller thingies located in some large “bath” (such as an economy, or a (non-conscious) attempt to minimize predictive error or something so as to secure neural resources, or a theremodynamic flow of sunlight or something). Like, maybe it is this way in humans, and maybe it is or will be this way in an AI. Maybe there won’t be anything that is well-regarded as “terminal goals.”
I said something like this to a friend, who was like “well, sure, the things that are ‘terminal’ goals for me are often ‘instrumental’ goals for evolution, who cares?” The thing I care about here is: how “fixed” are the goals, do they resist updating/dissolving when they cease being “profitable” from the perspective of thingies in an underlying substrate, or are they constantly changing as what is profitable changes? Like, imagine a kid who cares about playing “good, fun” videogames, but whose notion of which games are this updates pretty continually as he gets better at gaming. I’m not sure it makes that much sense to think of this as a “terminal goal” in the same sense that “make a bunch of diamond paperclips according to this fixed specification” is a terminal goal. It might be differently satiable, differently in touch with what’s below it, I’m not really sure why I care but I think it might matter for what kind of thing organisms/~agent-like-things are.
I think the problem here is distinguishing between terminal and instrumental goals? Most of people probably don’t run apple pie business because they have terminal goals about apple pie business. They probably want money, status, want to be useful and provide for their families and I expect this goals to be very persistent and self-preseving.
Not all such goals have to be instrumental to terminal goals, and in humans the line between instrumental and noninstrumental is not clear. Like at one extreme the instrumental goal is explicitly created by thinking about what would increase money/status, but at another “instrumental” goal is a shard reinforced by a money/status drive which would not change as the money/status drive changes.
Also even if the goal of selling apple pies is entirely instrumental, it’s still interesting that the goal can be dissolved once it’s no longer compatible with the terminal goal of e.g. gaining money. This means that not all goals are dangerously self-preserving.
Yes, exactly. Like, we humans mostly have something that kinda feels intrinsic but that also pays rent and updates with experience, like a Go player’s sense of “elegant” go moves. My current (not confident) guess is that these thingies (that humans mostly have) might be a more basic and likely-to-pop-up-in-AI mathematical structure than are fixed utility functions + updatey beliefs, a la Bayes and VNM. I wish I knew a simple math for them.
The simple math is active inference, and the type is almost entirely the same as ‘beliefs’.
I feel like… no, it is not very interesting, it seems pretty trivial? We (agents) have goals, we have relationships between them, like “priorities”, we sometimes abandon goals with low priority in favor of goals with higher priorities. We also can have meta-goals like “how should my systems of goals look like” and “how to abandon and adopt intermediate goals in a reasonable way” and “how to do reflection on goals” and future superintelligent systems probably will have something like that. All of this seems to me coming in package with concept of “goal”.
My goals for money, social status, and even how much I care about my family don’t seem all that stable and have changed a bunch over time. They seem to be arising from some deeper combination of desires to be accepted, to have security, to feel good about myself, to avoid effortful work etc. interacting with my environment. Yet I wouldn’t think of myself as primarily pursuing those deeper desires, and during various periods would have self-modified if given the option to more aggressively pursue the goals that I (the “I” that was steering things) thought I cared about (like doing really well at a specific skill, which turned out to be a fleeting goal with time).
What about things like fun, happiness, eudamonia, meaning? I certainly think that excluding brain damage/very advanced brainwashing, you are not going to eat babies or turn planets into paperclips.
Thanks for replying. The thing I’m wondering about is: maybe it’s sort of like this “all the way down.” Like, maybe the things that are showing up as “terminal” goals in your analysis (money, status, being useful) are themselves composed sort of like the apple pie business, in that they congeal while they’re “profitable” from the perspective of some smaller thingies located in some large “bath” (such as an economy, or a (non-conscious) attempt to minimize predictive error or something so as to secure neural resources, or a theremodynamic flow of sunlight or something). Like, maybe it is this way in humans, and maybe it is or will be this way in an AI. Maybe there won’t be anything that is well-regarded as “terminal goals.”
I said something like this to a friend, who was like “well, sure, the things that are ‘terminal’ goals for me are often ‘instrumental’ goals for evolution, who cares?” The thing I care about here is: how “fixed” are the goals, do they resist updating/dissolving when they cease being “profitable” from the perspective of thingies in an underlying substrate, or are they constantly changing as what is profitable changes? Like, imagine a kid who cares about playing “good, fun” videogames, but whose notion of which games are this updates pretty continually as he gets better at gaming. I’m not sure it makes that much sense to think of this as a “terminal goal” in the same sense that “make a bunch of diamond paperclips according to this fixed specification” is a terminal goal. It might be differently satiable, differently in touch with what’s below it, I’m not really sure why I care but I think it might matter for what kind of thing organisms/~agent-like-things are.