[Valence series] 1. Introduction
1.1 Summary & Table of Contents
This is the first of a series of five blog posts on valence. Here’s an overview of the whole series, and then we’ll jump right into the first post!
1.1.1 Summary & Table of Contents—for the whole Valence series
Let’s say a thought pops into your mind: “I could open the window right now”. Maybe you then immediately stand up and go open the window. Or maybe you don’t. (“Nah, I’ll keep it closed,” you might say to yourself.) I claim that there’s a final-common-pathway[1] signal in your brain that cleaves those two possibilities: when this special signal is positive, then the current “thought” will stick around, and potentially lead to actions and/or direct-follow-up thoughts; and when this signal is negative, then the current “thought” will get thrown out, and your brain will go fishing (partly randomly) for a new thought to replace it. I call this final-common-pathway signal by the name “valence”. Thus, the “valence” of a “thought” is roughly the extent to which the thought feels demotivating / aversive (negative valence) versus motivating / appealing (positive valence).
I claim that valence plays an absolutely central role in the brain—I think it’s one of the most important ingredients in the brain’s Model-Based Reinforcement Learning system, which in turn is one of the most important algorithms in your brain.
Thus, unsurprisingly, I see valence as a shining light that illuminates many aspects of psychology and everyday mental life. This series explores that idea. Here’s the outline:
Post 1 (Introduction) will give some background on how I’m thinking about valence from the perspective of brain algorithms, including exactly what I’m talking about, and how it relates to the “wanting versus liking” dichotomy. (The thing I’m talking about is closer to “motivational valence” than “hedonic valence”, although neither term is great.)
Post 2 (Valence & Normativity) will talk about the intimate relationship between valence and the universe of desires, preferences, values, goals, etc.—i.e. the “normative” side of the “positive-versus-normative” dichotomy, or equivalently the “ought” side of Hume’s “is-versus-ought”. I’ll start with simple cases: for example, if the idea of doing a certain thing right now feels unappealing (negative valence), then we’re less likely to do it. Then I’ll move on to more interesting cases, including what it means to like or dislike a broad concept like “religion”, and ego-syntonic versus ego-dystonic desires, and a descriptive account of moral reasoning and value formation.
Post 3 (Valence & Beliefs) is the complement of Post 2, in that it covers the relationship between valence and the universe of beliefs, expectations, concepts, etc.—i.e. the “positive” side of the “positive-versus-normative” dichotomy, or equivalently the “is” side of “is-versus-ought”. The role of valence here is less foundational than it is on the normative side, but it’s still quite important. I’ll talk specifically about motivated reasoning, the halo effect (a.k.a. affect heuristic), and some related phenomena.
Post 4 (Valence & Liking / Admiring) argues that when my brain assigns a positive valence to a person I know, that corresponds to a familiar everyday phenomenon that I call “liking / admiring”. I argue that this has close ties to social status, mirroring, deference, self-esteem, self-concepts, and more. I also argue that there’s an “innate drive to be liked / admired” which is critically important in human affairs, and I speculate a bit on how it works in the brain.
Post 5 (‘Valence Disorders’ in Mental Health & Personality) notes that, given the central role of valence in brain algorithms, it follows that if something creates systematic impacts on valence, it should lead to a characteristic suite of major downstream effects on mental life. I’ll propose three specific hypotheses along these lines:
(A) If the valence of every thought is shifted negative, that leads to a suite of symptoms strongly overlapping with depression;
(B) If the valence of every thought is shifted positive, that leads to a suite of symptoms strongly overlapping with mania;
(C) If the valence of every thought is “extremized”—very positive or very negative, but rarely in between—that leads to a suite of symptoms similar to narcissistic personality disorder.
Appendix A (Hedonic tone / (dis)pleasure / (dis)liking) has some more details about the hedonic tone (i.e., the “liking” side of the “wanting-versus-liking” dichotomy, in contrast to valence which is more about “wanting”). This appendix thus elaborates on the very brief discussion in §1.5.2 below. I suggest that “hedonic tone” is a different brain signal from “valence”, but centrally involved in the valence-calculation algorithm.
Valence and AI alignment deserves a post too, but actually I already wrote that one a while ago: see Plan for mediocre alignment of brain-like [model-based RL] AGI. Check it out if you’re interested. I won’t discuss AI further in this series, with some minor exceptions, including a section at the very end of the last post.
1.1.2 Summary & Table of Contents—for this first post in particular
This article introduces how I define and think about valence, in the context of high-level brain algorithms.
Section 1.2 argues that the brain does model-based reinforcement learning, and what I mean by that.
Section 1.3 talks more specifically about actor-critic reinforcement learning, and how “valence” is a control signal within that framework.
Section 1.4 relates valence to other terms like “aversion”, “incentive salience”, and “based versus cringe”.
Section 1.5 offers seven clarifications on common confusions, including the relation between valence and thoughts, and pleasure, and motivation, and other feelings, and imagination, and reinforcement learning in the brain more generally.
Section 1.6 is a brief conclusion.
1.2 Model-based reinforcement learning (RL)
The human brain has a model-based RL system that it uses for within-lifetime learning. I guess that previous sentence is somewhat controversial, but it really shouldn’t be:
The brain has a model—If I go to the toy store, I expect to be able to buy a ball.
The model is updated by self-supervised learning (i.e., predicting imminent sensory inputs and editing the model in response to prediction errors)—if I expect the ball to bounce, and then I see the ball hit the ground without bouncing, then next time I see that ball heading towards the ground, I won’t expect it to bounce.
The model informs decision-making—If I want a bouncy ball, I won’t buy that ball, instead I’ll buy a different ball.
There’s reinforcement learning—If I drop the ball on my foot just to see what will happen, and it really hurts, then I probably won’t do it again, and relatedly I will think of doing so as a bad idea.
…And that’s all I mean by “the brain has a model-based RL system”.
I emphatically do not mean that, if you just read a “model-based RL” paper on arxiv last week, then I think the brain works exactly like that paper you just read. On the contrary, “model-based RL” is a big tent comprising many different algorithms, once you get into details. And indeed, I don’t think “model-based RL as implemented in the brain” is exactly the same as any model-based RL algorithm on arxiv.
1.3 Actor-critic RL, and “valence”
Within the space of RL algorithms, a major subcategory is called “actor-critic RL”. I claim that the brain is of this type. A “critic” is basically any learning algorithm trained to assess whether something is a good idea or bad idea, based on the past history of RL rewards. In the context of the brain, for present purposes, I propose that we should think about it like this:
Valence has implications for both the “inference algorithm” (what the brain should do right now) and the “learning algorithm” (how the brain should self-modify so as to be more effective in the future). In this series, I’m mainly interested in the inference algorithm. There, the main thing that valence does is:
If valence is very negative, the current “thought” tends to get thrown out, and the “Thought Generator” part of the brain goes rummaging around and (partly-randomly) picks a new different thought to replace it.
If the valence is very positive, the current “thought” tends to stay active and get stronger. Relatedly, if the thought involves an immediate plan to issue motor outputs, then those motor outputs are likely to actually get issued. And if the thought is one piece of a temporal sequence (e.g. you’re in the middle of singing a song), then that temporal sequence will tend to continue. And so on.
As I have discussed here, you can draw an analogy between valence in the human brain, and the final-common-pathway control signal for a run-and-tumble mechanism in a simple mobile organism like a bacterium. Specifically:
When valence is positive, it roughly means “whatever (metaphorical) path I’m on—including not only what I’m doing right now, but also the plans currently in my head for what to do later—is a good path! I should carry on with that!”. This is analogous to the “run” of run-and-tumble: the bacteria keeps going in whatever direction it is currently going.
When valence is negative, it roughly means “I should randomly generate a new activity / plan right now”. This is analogous to the “tumble” of run-and-tumble: the bacteria randomly picks a new direction to go.
In fact, I don’t think it’s just an analogy—my guess is that there’s literally an unbroken chain of descent from the valence signal in my brain all the way back to a run-and-tumble-like control signal in the proto-brain of my tiny worm-like ancestors 600 million years ago.
Alternatively, we can take an AI-centric perspective, in which case I think the exact role of “valence” in the brain’s RL algorithm is a kind of funny mixture of value function (a.k.a. “critic”) and reward function:
It’s a bit like a conventional RL reward function in the sense that it can be “ground-truth-y”—for example, the brain has innate circuitry that issues negative valence in response to pain-related signals, and positive valence in response to eating-when-hungry, and so on for various other “innate drives”.
It’s a bit like a conventional RL value function in the sense that it can be “forward-looking”—for example, “the idea of walking to go get candy” can be positive valence (a.k.a. motivating), not because the walk itself is immediately pleasant, but because I’m hungry and want to eventually eat the candy.
(I won’t go into details about how valence relates to specific reinforcement learning algorithms—see the appendix for more on that.)
Either way, hopefully it’s clear that “valence” is one of the most important ingredients in one of the most important algorithms in the brain.
1.4 Terms closely related to “valence”
If you say “the thought of X is appealing” or “motivating”, that’s more-or-less synonymous with saying that such a thought has positive valence. Such thoughts have a kind of magnetic pull on our brains. The psychology jargon term “appetitive” means something similar.
On the other side, if you say “the thought of X is aversive”, that’s mostly synonymous with saying that such a thought has negative valence, but has a further connotation that this thought also strongly induces physiological arousal (increased heart rate etc.). For example, suppose you’re doing a math problem, and you think to yourself “What if I subtract 3 from both sides? Nope, that won’t work.” There was a negative-valence thought in here—I can tell because you did not in fact follow through with the plan to subtract 3 from both sides, and instead went back to brainstorming other ways to simplify the equation. But you probably wouldn’t call that thought “aversive”. For example, it wasn’t particularly unpleasant (more about pleasantness in §1.5.2 below). It’s just a bad plan that you’re unmotivated to execute.
As mentioned above, when a negative-valence thought pops into our head, our brain tosses it out and goes fishing for a new thought to replace it—and in the longer term, our brain stops thinking those negative-valence thoughts in the first place. However, highly aversive thoughts—involving both negative valence and arousal—can seem to defy that rule, in that our brain sometimes seems to be trying to get rid of these thoughts, but failing—for example, consider anxious rumination, or pain. I’ll explain what’s going on there in a later post—hang on until §3.3.5. (The short version is: I claim that such thoughts persist because of “involuntary attention”, which is a separate brain mechanism unrelated to valence.)
“Incentive salience” is another psychology jargon term. As far as I can tell, “the cheese has incentive salience” is basically synonymous with “if you’re paying attention to the cheese, then that thought will evoke positive valence”.
There are loads of heavily “valenced” words that we use on a day-to-day basis—“good” & “bad”, “yay” & “boo”, “pro-” & “anti-”, “based” & “cringe”, etc. These words are used mainly to communicate about valence. Much more on this topic in the next two posts, where I will also argue that pretty much every word we use is at least slightly valenced (although some words, like “religion”, have different valences for different people).
1.5 Clarifications
1.5.1 “Valence” (as I’m using the term) is a property of a thought—not a situation, nor activity, nor course-of-action, etc.
For example, suppose I have “mixed feelings” about going to the gym. What does that mean? In all likelihood, it means:
If I think about going to the gym in a certain way—if I pay attention to certain aspects of it, if I think about it using certain analogies / frames-of-reference, etc.—then that thought is appealing (positive valence);
If I think about going to the gym in a different way, then that thought is unappealing (negative valence).
For example, maybe the first thought is something like “I will go to the gym, thus following through on my New Year’s Resolution” and the second thought is something like “I will go to the loud and cold and smelly gym”.
So the action of “going to the gym” does not have a well-defined valence. But each individual thought does have a well-defined valence.
1.5.2 Valence (as I’m using the term) is different from “hedonic valence” / pleasantness
For example, suppose you consider doing something unpleasant (e.g. walking upstairs and getting a sweater) to avoid something even worse (e.g. continuing to feel cold), and you find that thought motivating, and so you do it. By my definitions, that was clearly a positive-valence thought. But it’s associated with unpleasantness—walking upstairs to the drawer is not particularly pleasurable in and of itself. The thought is positive valence mainly because the expected consequences are part of the same thought too—as you walk up the stairs, the thought in your head is more like “I’m getting a sweater”, not just “I’m walking up the stairs”.
More generally: If I find the thought Θ appealing / motivating (positive valence), I claim there are basically two possibilities. First, as in the sweater example above, maybe Θ entails paying attention to an expectation of something desirable that will happen later. Second, maybe the thought Θ entails paying attention to something good that is happening right now—e.g., if I’m getting a back massage. Both of those are common reasons for a thought Θ to have positive valence. But only the latter would be described using words like “pleasurable”, or “positive hedonic tone”, etc.
I’ve already said much about how I think about valence. So you might be wondering: What about pleasure?
By and large, when people say “X is pleasurable”, we can safely infer:
X is associated with a stable (as opposed to transient) mental state;
When in state X, the thought of staying in state X is positive-valence;
When not in state X, the thought of getting back into state X is positive-valence.
So valence and pleasure are not totally unrelated, but they’re not the same either. In terms of neuroscience, they’re associated with different signals in different parts of the brain, and in terms of conscious experience, I think they’re associated with different interoceptive sensory inputs (and hence they “feel different”, subjectively). There’s more to say here—about both the algorithms and how they relate to neuroanatomy—but that’s out-of-scope of this series.
For a mainstream take, see Berridge & Kringelbach 2015; for my own more detailed opinions see Appendix A.
1.5.3 “We do things exactly when they’re positive-valence” should feel almost tautological
Try to think of a time when the thought of doing a certain (motor) activity felt purely negative-valence, but then you went ahead and did it anyway. Did you think of an example? If so, we are not on the same page—you are not using the term “valence” in the same way that I am.
Back when you decided to do that activity, there must have been something motivating you to do that—some aspect of it must have felt motivating, even if it was “the idea of getting it over with”, or “morbid curiosity”, or “something about the kind of person I want to be and the stories I want to be able to tell myself”, or “just to prove to myself that I can”, or “to get rid of such-and-such annoying feeling”, or “it’s hard to articulate”. If you actually did the thing, there must have been some source-of-motivation within you making it happen, or you wouldn’t have done it! That source-of-motivation, whatever it is, was enough to make the overall thought “positive valence”, at least the way I’m using the term.
(If you think the “valence” signal with the properties that I ascribe to it in §1.3 simply doesn’t exist, then OK sure, we can talk about that. Instead, here I am trying to avoid the situation where you latch onto some brain signal that doesn’t match the §1.3 spec, and say to yourself, “Steve meant to be talking about this signal, but he is confused about its properties!”.)
1.5.4 Valence is also part of the world model, and hence (confusingly) a valence can be either real or imagined
Imagine seeing a purple tree out the window. What would it look like? In order to answer that question, your brain temporarily represents imagined visual stimuli—stimuli matching a purple tree, which are quite different from the real visual stimuli hitting your retinas right now.
Next: Again, imagine seeing a purple tree out the window. What would you feel upon seeing it? In order to answer that question, your brain temporarily represents imagined feelings, one of which is an imagined valence. (Your brain can also represent imagined arousal, imagined pleasure, and so on.) Just as in the visual case, these imagined feelings can be quite different from the real feelings active at that same moment.
But I find that most people have no issue with the former thing (distinguishing real versus imagined visual stimuli), yet get very confused by the latter thing (distinguishing real versus imagined valence and other “feelings”).
Maybe it helps that real visual stimuli seem to be “out there in the world” whereas imagined visual stimuli are “in our head”, so it feels very intuitive to treat them differently. By contrast, real valence and imagined valence are both “in our head”, so it’s not so obvious how they’re two different things. Moreover, if you’re imagining a valence, then that valence is part of the “thought”, and thus it can impact the thought’s (real) valence!! Likewise, imagined arousal can contribute to real arousal, and so on. Or the same root cause can lead to both real arousal and imagined arousal. That dynamic makes it even more tempting to lump them together into a confused mush. But really, they’re different!
For example, if I imagine something and I say “Boy, in that situation, I would be so incredibly angry!”, then I would have bodily reactions consistent with being slightly angry, but not consistent with being “so incredibly angry”. My heart rate would probably be slightly elevated compared to normal, but not extremely elevated. My face would be slightly flush, but not extremely flush. My hands might be slightly clenched into fists, but not strongly clenched. Etc. Thus, whereas I am imagining being “so incredibly angry”, my actual state right now is one of merely mild anger. Thus, I can imagine “feelings” different from my internal state. And again, I think the same idea applies to valence and lots of other things.
To clarify, let’s add some extra detail to the diagram from above:
The “Thought Generator” has a bunch of sensory inputs that are used for training a generative model by self-supervised learning (i.e., predicting imminent sensory inputs, then updating on prediction errors). Some of those sensory inputs are visual, so the model winds up “knowing” what visual inputs are expected in what circumstances. Some of those sensory inputs are auditory, so the model winds up “knowing” what auditory inputs are expected in what circumstances. And in exactly the same way, one of those sensory inputs is a copy of the valence signal, so the model winds up “knowing” what valence signal is expected in what circumstances.
And when we imagine things, that involves “querying” this learned model, and thus we can imagine feeling a valence in just the same way that we can imagine seeing sights and hearing sounds.
Thus, valence-as-a-sensory-input (green curvy arrow) is nothing special—just one of many things that gets incorporated into the learned model. By contrast, valence-the-control-signal (black arrow) has a special superpower—the power to throw out bad thoughts, and to keep and strengthen good thoughts (§1.3). In other words:
If you’re imagining X, and it involves a negative imagined valence, then maybe you imagine yourself not being motivated to pursue X.
If you’re imagining X, and the real valence of that thought is negative, then you stop imagining X in the first place and start thinking about something else instead.
See the difference? Here’s a diagram illustrating both:
1.5.5 Valence is just one of many dimensions of conscious interoceptive experience
I already mentioned this in the previous section, but it’s worth emphasizing. At any given time, your brain (especially hypothalamus and brainstem) is keeping track of probably hundreds of innate parameters: What’s my blood pressure? Salt level? How dilated are my pupils? How fertile am I? How lonely[2]? Am I geared up for a fight? Am I suppressing a laugh? How’s my immune system doing? On and on. And valence—an assessment of whether the current “thought” is better off kept versus discarded, all things considered (§1.3)—is one of those hundreds of innate parameters.
And I think that, as in the previous subsection, many or most of those innate parameters serve two roles:
First, they create various innate effects via lots of connections mainly within the brainstem and hypothalamus—e.g., the “I’m fertile” signal tends to increase sex drive, the “I’m hot” signal tends to initiate sweating, and the “valence” signal either throws out or strengthens thoughts (§1.3).
Second, they go up to our “Thought Generator” (a.k.a. world-model) and serve as interoceptive sensory inputs (as in the previous subsection), allowing us to “feel” what their current values are.
(Regrettably, the full catalog of interoceptive sensations is currently unknown to Science, and seems to have only a loose relationship to English-language emotion words, as I discuss here.)
So if you think a thought, it can bring forth positive or negative valence, and it can simultaneously bring forth a wide variety of other “feelings” that we might identify as anger, sorrow, regret, pleasure, pain, curiosity, and so on. There are systematic relationships between these things, just as there are systematic relationships between sights and sounds, but they are different axes of conscious experience.
1.5.6 Fine print: Throughout this series, I’m only talking about the brain’s “main” reinforcement learning system
I actually think there is more than one reinforcement learning (RL) system in the brain. But only one of them is the “main” RL system, and that’s the primary one I care about, and pretty much the only one that I ever write about, including in both this series and my previous series, and this is the RL system that is related to “valence”. I sometimes refer to this RL system as the “success-in-life” RL system, because it’s in charge of estimating whether something is a good or bad idea for the organism to do right now all things considered, including considerations of homeostasis, and sociality, and childrearing, and everything else that might be relevant to inclusive genetic fitness. (See discussion here.)
What are the other brain RL systems besides the “main” a.k.a. “success-in-life” one? My general answer is: these are narrowly-scoped RL systems that learn to perform very specific tasks with the help of a very specific reward signal.
One example, I believe, is related to motor control. I think the “main” RL system specifies what the motor system is supposed to be accomplishing at any particular time, but the gory details of how exactly to execute those movements, by precisely moving specific muscles at specific times, are delegated to one or more “narrow” RL systems. The “reward” for that narrow RL system would involve an assessment of whether it is accomplishing the current motion goals set out for it by the “main” RL system, along with (presumably) assessments of various metrics like motion smoothness, energy-efficiency, and so on.
So motor control is one example, and I think there are at least a couple more examples of “narrow” RL systems in the brain as well. I have a (brief and outdated) discussion in an old post here, and see also my more intuitive old discussion in Reward Is Not Enough.
1.5.7 I’m sweeping some complexity under the rug
As I keep mentioning, I think that the mechanism of §1.3, where negative-valence “thoughts” get discarded and positive-valence “thoughts” get kept and strengthened, is one of the most important mechanisms in the brain. So obviously, there has been a lot of evolutionary pressure over the last half-billion years to make this mechanism work really really well. That translates to a long list of clever tweaks and add-ons. I don’t understand them all myself, but from everything I’ve seen, the simplified high-level picture I’m presenting in this series is a great starting point—much more helpful than it is misleading. So those details generally need not concern us.
But here’s one detail that seems worth mentioning: I claim that “thoughts” are compositional, i.e. made of different pieces snapped together—for example, if you’re looking at an apple, your current “thought” involves semantic aspects (it’s an apple), and visual aspects (it’s red). If part of a “thought” seems good (it constitutes evidence that the thought merits positive valence) and another part seems bad (it constitutes evidence that the thought merits negative valence), then (I claim) there’s a mechanism by which your brain will attempt to throw out and replace the bad parts of the thought, while keeping the good parts. That’s generally only possible to a limited extent, because the different parts of a single thought are related—they constrain each other. But still, this dynamic is importantly involved in things like brainstorming (coming up in §3.3.3).
1.6 Conclusion
If you read the literature, “valence” refers to dozens of different things. But hopefully you now know what I personally am talking about when I say “valence” in this series. With that under our belt, the upcoming posts will discuss the manifold effects of valence on our mental lives.
Thanks to Tsvi Benson-Tilsen, Seth Herd, Aysja Johnson, Charlie Steiner, and Justis Mills for critical comments on earlier drafts.
- ^
Definition for non-neuroscientists: “Final common pathway” is a term I like. Start with a typical example from the literature of someone using that term: “motor neurons [in the spinal cord]…are the final common pathway for transmitting neural information from a variety of sources to the skeletal muscles.” What that means is: There’s some signal going down the spine to the muscles, and that signal will completely control what the muscles will do. But upstream of that signal, there’s a lot going on! Lots of different systems in the brain are all contributing to that signal, and modulating it, in complicated ways.
By the same token, when I call valence a “final-common-pathway signal”, I’m saying that there’s one brain signal called “valence” (ignoring some fine print, see §1.5.7), and it’s a real signal, encoded by real neurons firing, and this signal has extraordinarily important impacts on the brain. But the fact that it’s just one signal does not imply that it’s calculated in a simple way by a single system. There’s a single point of departure of the signal, but that’s merely the last step of a complex calculation involving systems all over the brain.
- ^
Yes, at least in rodents, the hypothalamus seems to have an innate circuit that specifically tracks how many days it’s been since I felt the comforting touch of a friend or family member. See Liu et al. (2023).
- [Intuitive self-models] 1. Preliminaries by 19 Sep 2024 13:45 UTC; 88 points) (
- [Valence series] 2. Valence & Normativity by 7 Dec 2023 16:43 UTC; 86 points) (
- [Intuitive self-models] 2. Conscious Awareness by 25 Sep 2024 13:29 UTC; 81 points) (
- [Intro to brain-like-AGI safety] 7. From hardcoded drives to foresighted plans: A worked example by 9 Mar 2022 14:28 UTC; 78 points) (
- [Valence series] 3. Valence & Beliefs by 11 Dec 2023 20:21 UTC; 75 points) (
- [Intuitive self-models] 3. The Homunculus by 2 Oct 2024 15:20 UTC; 69 points) (
- [Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL by 2 Mar 2022 15:26 UTC; 68 points) (
- [Intuitive self-models] 8. Rooting Out Free Will Intuitions by 4 Nov 2024 18:16 UTC; 64 points) (
- [Intuitive self-models] 6. Awakening / Enlightenment / PNSE by 22 Oct 2024 13:23 UTC; 62 points) (
- Neuroscience of human social instincts: a sketch by 22 Nov 2024 16:16 UTC; 55 points) (
- [Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning by 23 Feb 2022 14:44 UTC; 52 points) (
- [Intuitive self-models] 7. Hearing Voices, and Other Hallucinations by 29 Oct 2024 13:36 UTC; 50 points) (
- [Intro to brain-like-AGI safety] 10. The alignment problem by 30 Mar 2022 13:24 UTC; 48 points) (
- [Intro to brain-like-AGI safety] 9. Takeaways from neuro 2/2: On AGI motivation by 23 Mar 2022 12:48 UTC; 44 points) (
- [Valence series] 5. “Valence Disorders” in Mental Health & Personality by 18 Dec 2023 15:26 UTC; 43 points) (
- [Intro to brain-like-AGI safety] 14. Controlled AGI by 11 May 2022 13:17 UTC; 41 points) (
- The “context window” analogy for human minds by 13 Feb 2024 19:29 UTC; 38 points) (
- Incentive Learning vs Dead Sea Salt Experiment by 25 Jun 2024 17:49 UTC; 27 points) (
- Which Biases are most important to Overcome? by 1 Dec 2024 15:40 UTC; 26 points) (
- 20 Sep 2024 4:41 UTC; 6 points) 's comment on The case for more Alignment Target Analysis (ATA) by (
- 27 Jan 2024 11:16 UTC; 2 points) 's comment on Making Beliefs Pay Rent (in Anticipated Experiences) by (
- 11 Jun 2024 3:03 UTC; 2 points) 's comment on [Valence series] 5. “Valence Disorders” in Mental Health & Personality by (
To take the by now stereotypical action for me, here’s a connection to Buddhism. There are a few passages in the Buddhist Canon where someone comes to the Buddha and asks for a really simple practice that will nonetheless take them far. One of the more interesting answers is that continuous ‘mindfulness of vedana’ will get you there. Vedana corresponds to the concept of valence in that it is posited as the positive, negative, or neutral quality of mental objects that appears to untrained perception as already bundled together with those mental objects.
I’m excited to have this written up so clearly, nice work! I think this is important for alignment work in two ways: discourse and thinking about alignment is affected by powerful cognitive biases that this hypothesis explains; and, as you point out, we might build AGI that works like this, since it’s so effective for human cognition.
I’m very curious if this “rings true” to other readers based on their introspection and observation of others’ thinking patterns. I think this is both true and important. I’d arrived at this conclusion over the course of a research career studying dopamine and higher cognition. When we started researching cognitive biases, this came together, and I think this ubiquitous valence effect is the source of the most important cognitive biases. This goes by the names motivated reasoning, confirmation bias, and the halo effect; they have overlapping behavioral definitions. I think they’re the major stumbling block to humans behaving rationally.
I think this hypothesis is consistent with a vast array of empirical work on dopamine function and related cognitive function. But the evidence isn’t adequate to firmly establish that dopamine signals valence. That’s part of why I’d never written this up adequately, and because hypotheses this broad are outside of the scope of standard neuroscience funding.
I’m looking forward to the rest of the series, and hoping the posts addressing cognitive biases generate some discussion about how those biases affect alignment discussions. I think the combination of motivated reasoning, confirmation bias, and the halo/horns effects create powerful polarization that’s a big obstacle to rational discussions of alignment
Great post! And I was wondering what you meant by valence, but now it is clear.
I hope to write a longer comment later, but here is short question:
Is there some neurobiological evidence of the valence channel also going into the cortex, roughly?
I believe you are right. I am working on a comprehensive theory that covers valence, emotional evaluation, and belief sets. I propose that it is fairly easy to predict emotional response when certain information is known, essentially a binary tree of decision questions will lead to various emotions; the strength of the resulting emotions is based on a person’s current valance toward action/inaction as well as the result of normal emotional evaluation. Valences are added/subtracted to the final emotional evaluation. For example, something may produce happiness but if your valence is very low (depression), you will discount it. Valence and emotional evaluation are feedback loops resulting in greater and greater inhibition of action or greater and greater drivers of action. At the extremes we call these depression and mania. I’m going to digest your valence series a bit more and will be publishing some of my thoughts soon. Although if you are interested, would love to talk to you about them and possibly publish together. My knowledge of the mechanics of the brain are limited, I’m more of an algorithm/pattern person and only need enough detail to form my hypothesis. I don’t live in details like most. I abstract very quickly and that is where I play and think.
I’m probably not interested in coauthoring but I’ll be interested to read your ideas! :) Let me know when you publish anything so I don’t miss it (steven.byrnes@gmail.com).
Thanks. I take that as encouragement to hurry the f*** up.
Have you considered the fact that emotional evaluation comes at a high cost? It takes energy to evaluate the actual emotion as well as the valence. And it is all situational of course because to do emotions evaluation of a moment, you need to take beliefs/thoughts as well as sensory input. You model doesn’t point that out enough. The human brain grew from the brainstem/limbic to the cortex AND the motor cortex. Our CNS is part of our brain, period. And it all works on valence. The actions you take are informed by valance.
So the brain has to take beliefs and current input and evaluate it. Now, how much energy do you think that evaluation takes? And the higher the valence, the higher the urgency of your action/intents.
In the end, yes, the brain is an RL model. However, how is emotional valuation conducted? What brings back the decision for action? You say it is a sum total of micro valences. And it is . Each micro valence is make up of binary decisions about self, other, the topic. But what about the possible actions to take and the predicted benefit of each? That is for my paper.
So I will say that you have the gist of hte valence model correct as I see it. And because you published it first, I will ensure that I incorporate what you’re put together in my final model. I am working with a neuropsychologist on it and we plan to publish sometime this year. She is working on some experiments we can do to back up the paper’s claims.
This is clearly not true for edge cases, such as when the Corpus callosum is severed, and the left and right hemispheres cannot reconcile with each other.
At best it can be said each hemisphere ‘has a model’.
I think, when I say “model”, I have in mind something very broad like “a model is a thing that can be used for predictions, and is trained specifically to be good at predictions, e.g. by self-supervised learning”, and when you read the word “model”, you have in mind something very narrow, maybe “a model is something that is just like the model in AlphaZero or other such ML papers”.
For example, I can ask you “what will happen if I do X?” and you might say “If you do X, then Y will happen … oh wait, maybe Z will happen … umm, I’m not sure”. That would never happen in the “model” of AlphaZero. The “model” of AlphaZero takes in actions (moves) and spits out a board position, and this answer is clean and unique and (in the case of AlphaZero but not MuZero) guaranteed-to-be-correct. Obviously the kind of “model” built by the brain is not like that. Sometimes it issues somewhat-self-contradictory predictions and so on.
The thing you mention about split-brain patients is an extreme version, but I think it’s on a continuum with more mundane things like “if I think about it in this way, I predict X, and if I think about it in a different way, I predict Y”. Nevertheless, we are obviously able to make good predictions about the future, and we do so a zillion times a day—“I’m going to walk to the light-switch and flip it off” involves a model-based prediction that we are capable of straightforwardly walking to the light-switch and switching it off, and that if we do so, the switch will stay off and the room will be dark.
Those kinds of predictions (I claim) have all the properties that make it “a model” in my book: what we expect is not always what we want, and what we expect is much more likely to actualize than chance, and mistaken expectations tend to lead to model updates in a direction that will reduce the error in similar situations in the future. Yes it’s kinda messy, like sometimes your temporal lobe can’t reach perfect consensus with your parietal lobe, or your left hemisphere with your right hemisphere, and sometimes “what we expect” has other kinds of self-inconsistencies, etc. But it’s still definitely “a model”, in the (broad) way I use the term. :)
The brain has a model—an over arching one. At best is can be said the entire brain. Now, that model includes both hemispheres. Redundancy for one, but also just too may things to do and the need to many clusters of neurons. It is still true for edge cases like you said—in that case, when there is a severed corpus callous, the model is still there. You’ve just severed the highest level connection—a physical act that doesn’t change he fact that the brain has a model it is working with.
Huh? How is the model ‘still there’ for someone with a severed Corpus callosum?
As far as I’m aware it doesn’t grow back within a normal human lifespan...
Enjoyable post, I’ll be reading the rest of them. I especially appreciate the effort that went into warding off the numerous misinterpretations that one could easily have had (but I’m going to go ahead an ask something that may signal I have misinterpreted you anyhow).
Perhaps this question reflects poor reading comprehension, but I’m wondering whether you are thinking of valence as being implemented by something specific at a neurobiological level or not? To try and make the question clearer (in my own head as much as anything), let me lay out two alternatives to having valence implemented by something specific. First, one might imagine that valence is an abstraction over the kind of competitive dynamics that play out among thoughts. On this view, valence is a little like evolutionary fitness (the tautology talk in 1.5.3 brought this comparison to mind). Second, one might imagine that valence is widely distributed across numerous brain systems. On this view, valence is something like an emotion (if you’ll grant the hopefully-no-longer-controversial claim that the neural bases of emotions are widely distributed). I don’t think either of these alternatives are what you are going for, but I also didn’t see the outright claim that valence is something implemented by a specific neurobiological substrate. What do you believe?
Thanks!
I think in much much simpler animals, valence is a literal specific signal in the brain, basically the collective spiking activity of a population of dopamine neurons. In mammals, that’s still sorta-close-to-true, but I would need to add a whole bunch of caveats and footnotes to that, for reasons hinted at in §1.5.6–1.5.7.
(I have a bunch of idiosyncratic opinions about what exactly the basal ganglia is doing and how, but I don’t want to get into it here, sorry!)
I reject both the “first” and the “second” thing you mention. I’m much closer to “valence is pretty straightforwardly encoded by spikes going down specific known axons”.
Separately, I might or might not agree with “the neural bases of emotions are widely distributed”, depending on how we define the word “emotions” (and also how we define “neural bases”, I suppose!), see here.
I don’t know if I buy that valence is based on dopamine neurons but I do believe valance is delta between current state and possible future state. Very much like action potential or potential energy. If one possible outcome could grant you the world, then you will have a very high valance to do the actions needed. Likewise if you life is on the line, that is very high valance. That turns anger to rage. Unfortunately, my model also says that too many positive thoughts, lead to a race condition between dopamine generation and thought analysis can can lead to mania/psychosis. When you want things too much (desire) or too little (doubt/despair), the valences can get too high. And even evaluation of innocuous things can lead you to forming emotions or actions out of line with the current evaluation. That is, valance does not go to zero easily. And the valence of now, informs the valence of later. And I believe it is more like a 1/x function so when you get to extremes of valance, the desire to act or desire to not act, gets really high and is hard to over come.
Very cool post! We need a theory of valence that is grounded in real neuroscience, since understanding valence is pretty much required for any alignment agenda that works the first time.