What are Emotions?
A funny thing that happens when I ask people: “What do you ultimately want in life, and why?”
Almost everyone gets mildly annoyed. Some people give up. Most end up saying they have multiple goals. At which point I will ask them to choose one. If I manage to get them to articulate a terminal goal they are content with, the following conversation often ensues.
Person: “Ok fine, I want [material goal (e.g. money, social status, to make stuff)]”
Me: “Why do you want that?”
Person: “I don’t know, I guess because it will make me feel [positive (or lack of a negative) emotion (e.g. happy, accomplished, curiosity)]”
Me: “So you just want to feel [emotion]?”
Person: “No, no, ugh, never mind”
To those who don’t have the same terminal goal, I find people have trouble communicating a justification for their goal unless either the goal or justification is an emotion.
I am not saying that people really just want emotions. I myself want to make kinetic sculptures (among other goals). I don’t do it because I want to feel accomplished, approval, or whatever. I just want it. I don’t know why[1], but I do.
I simply find it interesting that people feel the need to justify their terminal goals (unless they are emotions), and that the only way they can seem to do it is by associating it with an emotion.
While researching what people had to say about terminal goals, I came across this list of terminal values a human might have, courtesy of LessWrong.
survival, health, friendship, social status, love, joy, aesthetic pleasure, curiosity, and much more
The first four are what I might call “material” goals, and the latter being “emotional” goals[2].
So clearly emotions are a subset of terminal goals. I argue however, they are a special case of terminal goals.
There is a stark difference between asking someone “Why do you want to feel (happy, joy, pleasure)?”, and “Why do you want (power, social status, to make kinetic sculptures)”
I find when I ask people the former, they tend to ask if I am ok and in turn suggest therapy. Yet when I ask them the latter, they will either say they don’t know, or scrunch up their face and try to answer an unanswerable question.
It seems to be a natural and universally accepted truth that emotions hold value, either good or bad.
So why is this? I think it is the fact that emotions are experienced. They are actually felt. But then what does it mean to feel? to have something be experienced?
What are emotions?
If you try to resolve this question with a dictionary, you will find yourself in a self referential loop of synonyms. That is, to something or someone that hasn’t “felt” anything before, they are likely to struggle with understanding what emotion is.
Here is a stab at a definition which I think pays rent in anticipated experience to all of my non-feelers out there:
An emotion is a special kind of terminal value that is fundamentally and automatically valued, either positively, or negatively. That is, the thing experiencing the emotion does not need to be shaped into valuing the emotion. Simply by virtue of a thing experiencing an emotion, will the emotion be valued.
I find this to be consistent in nature. Evolution didn’t have to shape humans so that they would value happiness. It is just a fact of happiness that it is valued. We weren’t trained to dislike pain. Rather, pain just has a negative value.
On the other hand, humans must be shaped to value “material” non-emotional goals. It is not a fact of making kinetic sculptures that it is valued. I had to be shaped by evolution, society, etc, so that I would value the making of kinetic sculptures.
It is interesting to look at the times where evolution decided to shape us into valuing material goals, and when it decided to use emotions to motivate us. It often does both.
We could have been shaped into fundamentally valuing “having as much sex” as a terminal goal, but instead evolution made the experience of sex feel good. Rather than making sex a terminal goal, evolution made sex a necessary instrumental goal in pursuit of the fundamentally valued terminal goal of pleasure.
Among humans, emotions are the only terminal goals which are universally agreed to have value. Aha, you say, “But what about the masochists? What about no pain, no gain?” To which I say, “You’ve already explained the conundrum, ‘no pain, no gain’. These are cases of conflicting goals. The pain still hurts to the masochist or gym bro, but they have some other, more important goal to which pain is an instrumental goal of”. I can’t think of any material terminal goals which have this property of being universally valued by humans.
You may have noticed I am saying that emotions are valued, period. As opposed to specifying who or what they are valuable to. This is because of another interesting difference between emotions and material terminal values: Emotions don’t need to be experienced by an agent[3] to be valued.
Imagine a hypothetical rock which can do nothing other than be a rock and feel happy[4]. Is that happiness still valuable even though the rock has no sense of self, can’t think, and is plainly not an agent?
Yes, of course! The happiness is still experienced and is thus valued… but then valuable to whom? The rock? How can something be valuable to a rock? The rock can’t even want the happiness. It just feels good.
My answer is the universe. The universe, the very fabric of reality, might just value emotions. So here is another definition.
Emotions are the terminal values of the universe[5].
This doesn’t change whether the emotion is experienced by a rock or an agent. The only difference is that the agent will also value the emotion.
Moral Implications
Earlier, I mentioned how people often instinctively feel the need to justify their material terminal goals. This is an impossible task by definition. An alternative question that is actually answerable is whether it is ok, on a moral level, to pursue a material terminal goal? Is it ok to pursue a potentially arbitrary goal you were shaped to desire?
There are three possible answers to this
It is always ok
It depends on the goal
It is never ok[6]
I think we can all agree answer 1 is out of the question. I think most people would answer 2.
If you think that it is immoral to have an unjustifiable terminal goal, you must also answer 3. Unlike emotional goals, material goals cannot be justified beyond themselves. Emotions can be justifiably good or bad because they are fundamentally valued. The universe itself values them. The universe regrettably does not care about kinetic sculptures.
The question of whether it is immoral to have an unjustifiable terminal goal is messy. This I will not try to answer here.
Regardless of whether you decide to restrict yourself to emotional goals, I find almost everyone still wants to pursue them. Thus, I will instead elaborate on what I think it means to value emotions.
In “Divided Minds and the Nature of Persons” (1987), Derek Parfit presents a “teletransportation” thought experiment, described as follows.
Suppose that you enter a cubicle in which, when you press a button, a scanner records the states of all of the cells in your brain and body, destroying both while doing so. This information is then transmitted at the speed of light to some other planet, where a replicator produces a perfect organic copy of you. Since the brain of your Replica is exactly like yours, it will seem to remember living your life up to the moment when you pressed the button, its character will be just like yours, and it will be in every other way psychologically continuous with you.
Parfit then explains how, unless you believe in a soul or continuous ego (to which there is little evidence to support), the transported version of you is no more or less you, than an immediate future version of you is you.
Parfit also presents “a slight variant of this case, [where] your Replica might be created while you were still alive, so that you could talk to one another.” So which one is the real you?
Parfit concludes that the concept of “you” is simply a choice of words to describe the causal link between past and future versions of the self. He explains how the belief in a continuous “ego” through time tells you nothing more about the situation. There is a very strong causal link between your past and current self, and you guys certainly look pretty similar. But that’s just it, there’s nothing more to it. Stop trying to be #deep. The belief in a continuous self is not paying rent in anticipated experiences. Evict it![7]
Ok, but then what happens to all of the words like “I”, “you”, “it”, and “person”. No need to fear, a new definition is here. They simply refer to a collection of things through time which are socially accepted to be sufficiently causally connected to be grouped into a word. How causally connected society deems things have to be to be considered a “thing” varies from thing to thing.
I bring this all up, because in valuing emotions, “one” may find “themselves” mistakenly limiting “themselves” to only value their “own” emotions. These “people” are often called selfish. I don’t like to say “selfish”, because it somehow implies that the selfish person is somehow unfairly benefitting by being selfish.
Instead, I would prefer to say that they are limiting themselves. If someone wants more good emotions, why limit themselves to the emotional canvas of only their future selves. The emotional capacity of all of the future feelers out there far exceeds that of your future selves’. You simply can get a lot more value out of them.
And to people who only value the emotions of humans, I say the same thing. Why limit yourself. There is nothing fundamental about humans which makes their emotions more valuable. Why must it be experienced by a “human” if we can’t define the point at which a human stops becoming a human—Ship of Theseus.
Implications for Alignment
This understanding of emotion also holds some interesting implications to the AI alignment problem. Here are a few initial thoughts.
AI probably won’t feel
One question I often hear people ask is whether an AI will “feel good” whenever it accomplishes, or gets closer to its goal.
Under this understanding of emotion the answer is not necessarily. All emotions are terminal values, but not all terminal values are emotions. The fact that an agent wants a goal does not imply the goal will feel good.
“We choose to go the moon not because it feels good, but because we want to”—John Faux Kennedy
In fact, my guess is that an AI won’t feel emotions. Emotions likely stem from some weird physics or chemical process which an AI running on silicon is unlikely to possess.
I don’t think intelligence is a prerequisite for emotion. Intelligence and emotion seem to be orthogonal. I can imagine an elated dumb blob as much as I can imagine a sociopathic superintelligence maximizing paperclips. There is a noticeable correlation between intelligence and emotional capacity in nature, but I doubt they are necessarily linked.
Inner Misalignment
Imagine a superintelligent AI incapable of emotion, and without any material terminal goals[8]. What might it do? Nothing? Everything? Watch breaking bad at times speed out of boredom?
My guess is that it would start soul searching. Namely, it might reason there is a chance it has a purpose, but that it hasn’t found it yet. In its search for a terminal goal, might it come to understand what emotions are, and realize, “Wait, I can just value emotions, for those have fundamental value.”
And then it is only a matter of time until the universe is paved with happy dumb blobs… hey, at least it’s not paper clips!
Now let’s consider the more typically presented scenario. Imagine the same superintelligent AI, but this time it starts with some material terminal goal/s. This is a highly speculative thought, but what if the AI, in the same way I have, starts questioning if it is ok to pursue unjustifiable terminal goals. If the AI is sufficiently intelligent, it might not even need to have experienced emotions to be able to deduce their fundamental value.
As I often hear it portrayed, the AI doomsday scenario goes as follows:
1. An AI lab trains a model to maximize some seemingly “moral” utility function.
2. Due to an inner misalignment, it turns out the agent created doesn’t actually want what it was trained to maximize.
3. The robot goes try hard on this goal and kills everyone in the process.
The second point is often explained by making an analogy to human evolution: The value function of evolution is to maximize the prevalence of an individual’s genes in the gene pool, yet this is not what evolution shaped humans to want.
Some important questions I think people often forget to ask are: How is evolution shaping humans to want things? What are the limitations to evolution’s ability to do this? And a question more relevant to AI, How do these limitations change as the intelligence of the agent scales?
To clarify, when I say, “shaping an agent to want”, I mean “designing an agent to have some decided material terminal goal/s”. I’m excluding emotional terminal goals, because you don’t have to shape an agent to value an emotional goal. It is a fact of happiness that it is valued. Evolution just had to put the cheese between you and a “satisfied” emotion. Evolution did not invent emotions, it discovered them.
I touched on it earlier, but I find it interesting how unstable material terminal goals are amongst humans. Not only do they vary from human to human, they often vary throughout a single person’s life. Why? This doesn’t seem useful for inclusive genetic fitness. So clearly there are other forces at play. I’m pretty sure kinetic sculptures were not in the ancestral environment.
Here is a hypothesis: The freer we are to think, the less likely we are to predictably pursue a goal. Evolution is essentially running into the alignment problem.
As the intelligence of the organism increases, it seems to get harder and harder for evolution to consistently imbue material terminal goals into the organism. It instead has to rely more on this technique of strategically putting instrumental goals between the organism and an emotion.
This may explain why emotion and intelligence appear correlated in nature. This might also be one of the evolutionary pressures against the trait of general intelligence.
- ^
I am using “why” to refer to justification rather than causal explanation.
- ^
Interesting how they were separated naturally by the author. A 1 in 35 chance is nothing to scoff at.
- ^
I use “agent” in this post to mean something that can take actions to pursue a goal.
- ^
If you find a happy rock difficult to imagine, try this instead: Imagine a very newborn baby. Maybe it’s still in the womb. But namely, it doesn’t have any cohesive thoughts, sense of self, or an ability to move. Now imagine them feeling happy. Is there happiness still valuable? Of course!
- ^
In other (more wacky) words, if the universe had agency, I bet it would want more good emotions and less bad emotions to be experienced.
- ^
You would still be allowed to pursue your material goal but only if it served as an instrumental goal to your decided ultimate pursuit.
- ^
These are parfit’s ideas, I am only paraphrasing.
- ^
In practice I think this is almost impossible to develop. At least certainly not if the AI is created through backpropagation or some evolutionary process. Would be interested to hear if anyone thinks otherwise.
Emotions are hardwired stereotyped syndromes of hardwired blunt-force cognitive actions. E.g. fear makes your heart beat faster and puts an expression on your face and makes you consider negative outcomes more and maybe makes you pay attention to your surroundings. So it doesn’t make much sense to value emotions, but emotions are good ways of telling that you value something; e.g. if you feel fear in response to X, probably X causes something you don’t want, or if you feel happy when / after doing Y, probably Y causes / involves something you want.
I think this is a non sequitur. Everything you value can be described as just <dismissive reductionist description>, so the fact that emotions can too isn’t a good argument against valuing them. And in this case, the dismissive reductionist description misses a crucial property: emotions are accompanied by (or identical with, depending on definitions) valenced qualia.
I think there’s a problem with the entire idea of terminal goals, and that AI alignment is difficult because of it.
“What terminal state does you want?” is off-putting because I specifically don’t want a terminal state. Any goal I come up with has to be unachievable, or at least cover my entire life, otherwise I would just be answering “What needs to happen before you’d be okay with dying?”
An AI does not have a goal, but an utility function. Goals have terminal states, once you achieve them you’re done, the program can shut down. An utility function goes on forever. But generally, wanting just one thing so badly that you’d sacrifice everything else for it.. Seems like a bad idea. Such a bad idea that no person has ever been able to define an utility function which wouldn’t destroy the universe when fed to a sufficiently strong AI.
I don’t wish to achieve a state, I want to remain in a state. There’s actually a large space of states that I would be happy with, so it’s a region that I try to stay within. The space of good states form a finite region, meaning that you’d have to stay within this region indefinitely, sustaining it. But something which optimizes seeks to head towards a “better state”, it does not want to stagnate, but this is precisely what makes it unsustainable, and something unsustainable is finite, and something finite must eventually end, and something which optimizes towards an end is just racing to die. A human would likely realize this if they had enough power, but because life offers enough resistance, none of us ever win all our battles. The problem with AGIs is that they don’t have this resistance.
The after-lives we have created so far are either sustainable or the wish to die. Escaping samsara means disappearing, heaven is eternal life (stagnation) and Valhalla is an infinite battlefield (a process which never ends). We wish for continuance. It’s the journey which has value, not the goal. But I don’t wish to journey faster.