I think the steelmaned version of beren’s argument is
The potential for empathy is a natural consequence of learned reward models
That you indeed get for free. It will not get you far, as you have pointed out, because once you get more information, the model will learn to distinguish the cases precisely. And we know from observation that some mammals (specifically territorial ones) and most other animals do not show general empathy.
But there are multiple ways that empathy can be implemented with small additional circuitry. I think this is the part of beren’s comment that you were referring to:
For instance, you could pass the RPE through to some other region to detect whether the empathy triggered for a friend or enemy and then return either positive or negative reward, so implementing either shared happiness or schadenfreude. Generally I think of this mechanism as a low level substrate on which you can build up a more complex repertoire of social emotions by doing reward shaping on these signals.
But it might even be possible that no additional circuitry is required if the environment is just right. Consider the case of a very social animal in an environment where individuals, esp. young ones, rarely can take care of themselves alone. In such an environment, there may be many situations where the well-being of others predicts your own well-being. For example, if you give something to the other (and that might just be smile) that makes it more likely to be fed. This doesn’t seem to necessarily require any extra circuits, though it might be more likely to bootstrap off some prior mechanisms, e.g., grooming or infant care.
This might not be stable because free-loading might evolve, but this is then secondary.
I wonder which of these cases this comment of yours is:
consider “seeing someone get unexpectedly punched hard in the stomach”. That makes me cringe a bit, still, even as an adult.
But it might even be possible that no additional circuitry is required if the environment is just right. Consider the case of a very social animal in an environment where individuals, esp. young ones, rarely can take care of themselves alone. In such an environment, there may be many situations where the well-being of others predicts your own well-being. For example, if you give something to the other (and that might just be smile) that makes it more likely to be fed. This doesn’t seem to necessarily require any extra circuits, though it might be more likely to bootstrap off some prior mechanisms, e.g., grooming or infant care.
This might not be stable because free-loading might evolve, but this is then secondary.
I don’t really buy this. For my whole childhood, I was in an environment where it was illegal, dangerous, and taboo for me to drive a car (because I was underage). And then I got old enough to drive, and so of course I started doing so without a second thought. I had not permanently internalized the idea that “Steve driving a car” is bad. Instead, I got older, my situation changed, and my behavior changed accordingly. Likewise, I dropped tons of other habits of childhood—my religious practices, my street address, my bedtime, my hobbies, my political beliefs, my values, etc.—as soon as I got older and my situation changed.
So by the same token, when I was a little kid, yes it was in my self-interest (to some extent) for my parents to be healthy and happy. But that stopped being true as soon as I was financially independent. Why assume that people would permanently internalize that, when they fail to permanently internalize so many other aspects of childhood?
Actually it’s worse than that—adolescents are notorious for not feeling motivated by the well-being of their parents, even while such well-being is still in their own narrow self-interest!! :-P
(And generalizing across people seems equally implausible to generalizing across time. I called my parents “mom and dad”, but I didn’t generalize that to calling everyone I met “mom and dad”. So why assume that my brain would generalize being-nice-to-parents to being-nice-to-everyone?)
It’s true that sometimes childhood incentives lead to habits that last through adulthood, but I think that mainly happens via (1) the adult independently assesses those habits as being more appealing than alternatives, or (2) the adult continues the habits because it’s never really occurred to them that there was any other option.
As an example of (2), a religious person raised in a religious community might stay religious by default. Until, that is, they move to the big city, where they have atheist roommates and coworkers and friends. And at that point, they’ll probably at least imagine the possibility of becoming atheist. And they might or might not find that possibility appealing, based on their personality and so on.
But (2) doesn’t particularly apply to the idea of being selfish. I don’t think people are nice because it’s never even crossed their mind, not even once in their whole life, that maybe they could not do a nice thing. That’s a very obvious and salient idea! :)
habits that last through adulthood [because] the adult independently assesses those habits as being more appealing than alternatives,
I think that the habit of being nice to people is empathy.
So by the same token, when I was a little kid, yes it was in my self-interest (to some extent) for my parents to be healthy and happy. But that stopped being true as soon as I was financially independent. Why assume that people would permanently internalize that, when they fail to permanently internalize so many other aspects of childhood?
I’m not claiming that they “permanently internalize” but that they correctly (well, modulo mistakes) predict that it is their interests. You started driving a car because you correctly predicted that the situation/environment had changed. But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.
Actually it’s worse than that—adolescents are notorious for not feeling motivated by the well-being of their parents, even while such well-being is still in their own narrow self-interest!! :-P
That depends on the type of well-being and your ability to predict it. And maybe other priorities get in the way during that age. And again, I’m not claiming unconditional goodness. The environment of young adults is clearly different from that of children, but it is comparable enough to predict positive value from being nice to your parents.
Actually, psychopaths prove this point: The anti-social behavior is “learned” in many cases during abusive childhood experiences, i.e., in environments where it was exactly not in their interest to be nice—because it didn’t benefit them. And on the other side, psychopaths can, in many cases, function and show prosocial behaviors in stable environments with strong social feedback.
This also generalizes to the cultures example.
As an example of (2), a religious person raised in a religious community might stay religious by default. Until, that is, they move to the big city
I agree: In the city, many of their previous predictions of which behaviors exactly lead to positive feedback (“quoting the Bible”) might be off and they will quickly learn new behaviors. But being nice to people in general, will still work. In fact, I claim, it tends to generalize even more, which is why people who have been around more varied communities tend to develop more generalized morality (higher Kegan levels).
I think we agree that motivations need to ground out directly or indirectly with “primary rewards” from innate drives (pain is bad, eating-when-hungry is good, etc., other things equal). (Right?)
And then your comment kinda sounds like you’re making the following argument:
There’s no need to posit the existence of an innate drive / primary reward that ever makes it intrinsically rewarding to be nice to people, because “you get positive feedback from being nice to people”, i.e. you will notice from experience that “being nice to people” will tend to lead to (non-social) primary rewards like eating-when-hungry, avoiding pain, etc., so the learning algorithm in your brain will sculpt you to have good feelings around being nice to people.
If that’s what you’re trying to say, then I strongly disagree and I’m happy to chat about that … but I was under quite a strong impression that that’s not what you believe! Right?
I thought that you believed that there is a primary reward / innate drive that makes it feel intrinsically rewarding for adults to be nice (under certain circumstances); if so, why bring up childhood at all?
I do think that there are mechanisms in the human brain that make prosocial behavior more intrinsically rewarding, such as the mechanisms you pointed out in the Valence sequence.
But I also notice that in the right kind of environments, “being nice to people” may predict “people being nice to you” (in a primary reward sense) to a higher degree than might be intuitive.
I don’t think that’s enough because you still need to ensure that the environment is sufficiently likely to begin with, with mechanisms such as rewarding smiles, touch inclinations, infant care instincts or whatever.
I think this story of how human empathy works may plausibly involve both social instincts as well as the self-interested indirect reward in very social environments.
(A) There’s a thing where people act kind towards other people because it’s in their self-interest to act kind—acting kind will ultimately lead to eating yummier food, avoiding pain, and so on. In everyday life, we tend to associate this with flattery, sucking up, deception, insincerity, etc., and we view it with great skepticism, because we recognize (correctly) that such a person will act kind but then turn right around and stab you in the back as soon as the situation changes.
(B) There’s a separate thing where people act kind towards other people because there’s some innate drive / primary reward / social instinct closely related to acting kind towards other people, e.g. feeling that the other person’s happiness is its own intrinsic reward. In everyday life, we view this thing very positively, because we recognize that such a person won’t stab you in the back when the situation changes.
I keep trying to pattern-match what you’re saying to:
(C) [which I don’t believe in] This is a third category of situations where people are kind. Like (A), it ultimately stems from self-interest. But like (B), it does not entail the person stabbing you in the back as soon as the situation changes (such that stabbing you in the back is in their self-interest). And the way that works is over-generalization. In this story, the person finds that it’s in their self-interest to act kind, and over-generalizes this habit to act kind even in situations where it’s not in their self-interest.
And then I was saying that that kind of over-generalization story proves too much, because it would suggest that I would retain my childhood habit of not-driving-cars, and my childhood habit of saying that my street address is 18 Main St., etc. And likewise, it would say that I would continue to wear winter coats when I travel to the tropics, and that if somebody puts a toy train on my plate at lunchtime I would just go right ahead and eat it, etc. We adults are not so stupid as to over-generalize like that. We learn to adapt our behavior to the situation, and to anticipate relevant consequences.
But maybe that’s not what you’re arguing? I’m still kinda confused. You wrote “But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.” I want to translate that as: “All this talk of stabbing people in the back is irrelevant, because there is practically never a situation where it’s in somebody’s self-interest to act unkind and stab someone in the back. So (A) is really just fine!” I don’t think you’d endorse that, right? But it is a possible position—I tend to associate it with @Matthew Barnett. I agree that we should all keep in mind that it’s very possible for people to act kind for self-interested reasons. But I strongly don’t believe that (A) is sufficient for Safe & Beneficial AGI. But I think that you’re already in agreement with me about that, right?
I’m still kinda confused. You wrote “But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.” I want to translate that as: “All this talk of stabbing people in the back is irrelevant, because there is practically never a situation where it’s in somebody’s self-interest to act unkind and stab someone in the back. So (A) is really just fine!” I don’t think you’d endorse that, right? But it is a possible position—I tend to associate it with @Matthew Barnett. I agree that we should all keep in mind that it’s very possible for people to act kind for self-interested reasons. But I strongly don’t believe that (A) is sufficient for Safe & Beneficial AGI. But I think that you’re already in agreement with me about that, right?
Without carefully reading the above comment chain (forgive me if I need to understand the full discussion here before replying), I would like to clarify what my views are on this particular question, since I was referenced. I think that:
It is possible to construct a stable social and legal environment in which it is in the selfish interests of almost everyone to act in such a way that brings about socially beneficial outcomes. A good example of such an environment is one where theft is illegal and in order to earn money, you have to get a job. This naturally incentivizes people to earn a living by helping others rather than stealing from others, which raises social welfare.
It is not guaranteed that the existing environment will be such that self-interest is aligned with the general public interest. For example, if we make shoplifting de facto legal by never penalizing people who do it, this would impose large social costs on society.
Our current environment has a mix of both of these good and bad features. However, on the whole, in modern prosperous societies during peacetime, it is generally in one’s selfish interest to do things that help rather than hurt other people. This means that, even for psychopaths, it doesn’t usually make selfish sense to go around hurting other people.
Over time, in societies with well-functioning social and legal systems, most people learn that hurting other people doesn’t actually help them selfishly. This causes them to adopt a general presumption against committing violence, theft, and other anti-social acts themselves, as a general principle. This general principle seems to be internalized in most people’s minds as not merely “it is not in your selfish interest to hurt other people” but rather “it is morally wrong to hurt other people”. In other words, people internalize their presumption as a moral principle, rather than as a purely practical principle. This is what prevents people from stabbing each other in the backs immediately once the environment changes.
However, under different environmental conditions, given enough time, people will internalize different moral principles. For example, in an environment in which slaughtering animals becomes illegal and taboo, most people would probably end up internalizing the moral principle that it’s wrong to hurt animals. Under our current environment, very few people internalize this moral principle, but that’s mainly because slaughtering animals is currently legal, and widely accepted.
This all implies that, in an important sense, human morality is not really “in our DNA”, so to speak. Instead, we internalize certain moral principles because those moral principles encode facts about what type of conduct happens to be useful in the real world for achieving our largely selfish objectives. Whenever the environment shifts, so too does human morality. This distinguishes my view from the view that humans are “naturally good” or have empathy-by-default.
Which is not to say that there isn’t some sense in which human morality comes from human DNA. The causal mechanisms here are complicated. People vary in their capacity for empathy and the degree to which they internalize moral principles. However, I think in most contexts, it is more appropriate to look at people’s environment as the determining factor of what morality they end up adopting, rather than thinking about what their genes are.
Sorry for oversimplifying your views, thanks for clarifying. :)
Here’s a part I especially disagree with:
Over time, in societies with well-functioning social and legal systems, most people learn that hurting other people doesn’t actually help them selfishly. This causes them to adopt a general presumption against committing violence, theft, and other anti-social acts themselves, as a general principle. This general principle seems to be internalized in most people’s minds as not merely “it is not in your selfish interest to hurt other people” but rather “it is morally wrong to hurt other people”. In other words, people internalize their presumption as a moral principle, rather than as a purely practical principle. This is what prevents people from stabbing each other in the backs immediately once the environment changes.
Just to be clear, I imagine we’ll both agree that if some behavior is always a good idea, it can turn into an unthinking habit. For example, today I didn’t take all the cash out of my wallet and shred it—not because I considered that idea and decided that it’s a bad idea, but rather because it never crossed my mind to do that in the first place. Ditto with my (non)-decision to not plan a coup this morning. But that’s very fragile (it relies on ideas not crossing my mind), and different from what you’re talking about.
My belief is: Neurotypical people have an innate drive to notice, internalize, endorse, and take pride in following social norms, especially behaviors that they imagine would impress the people whom they like and admire in turn. (And I have ideas about how this works in the brain! I think it’s mainly related to what I call the “drive to be liked / admired”, general discussion here, more neuroscience details coming soon I hope.)
The object-level content of these norms is different in different cultures and subcultures and times, for sure. But the special way that we relate to these norms has an innate aspect; it’s not just a logical consequence of existing and having goals etc. How do I know? Well, the hypothesis “if X is generally a good idea, then we’ll internalize X and consider not-X to be dreadfully wrong and condemnable” is easily falsified by considering any other aspect of life that doesn’t involve what other people will think of you. It’s usually a good idea to wear shoes that are comfortable, rather than too small. It’s usually a good idea to use a bookmark instead of losing your place every time you put your book down. It’s usually a good idea to sleep on your bed instead of on the floor next to it. Etc. But we just think of all those things as good ideas, not moral rules; and relatedly, if the situation changes such that those things become bad ideas after all for whatever reason, we’ll immediately stop doing them with no hesitation. (If this particular book is too fragile for me to use a bookmark, then that’s fine, I won’t use a bookmark, no worries!)
those moral principles encode facts about what type of conduct happens to be useful in the real world for achieving our largely selfish objectives
I’m not sure what “largely” means here. I hope we can agree that our objectives are selfish in some ways and unselfish in other ways.
Parents generally like their children, above and beyond the fact that their children might give them yummy food and shelter in old age. People generally form friendships, and want their friends to not get tortured, above and beyond the fact that having their friends not get tortured could lead to more yummy food and shelter later on. Etc. I do really think both of those examples centrally involve evolved innate drives. If we have innate drives to eat yummy food and avoid pain, why can’t we also have innate drives to care for children? Mice have innate drives to care for children—it’s really obvious, there are particular hormones and stereotyped cell groups in their hypothalamus and so on. Why not suppose that humans have such innate drives too? Likewise, mice have innate drives related to enjoying the company of conspecifics and conversely getting lonely without such company. Why not suppose that humans have such innate drives too?
The object-level content of these norms is different in different cultures and subcultures and times, for sure. But the special way that we relate to these norms has an innate aspect; it’s not just a logical consequence of existing and having goals etc. How do I know? Well, the hypothesis “if X is generally a good idea, then we’ll internalize X and consider not-X to be dreadfully wrong and condemnable” is easily falsified by considering any other aspect of life that doesn’t involve what other people will think of you.
To be clear, I didn’t mean to propose the specific mechanism of: if some behavior has a selfish consequence, then people will internalize that class of behaviors in moral terms rather than in purely practical terms. In other words, I am not saying that all relevant behaviors get internalized this way. I agree that only some behaviors are internalized by people in moral terms, and other behaviors do not get internalized in terms of moral principles in the way I described.
Admittedly, my statement was imprecise, but my intention in that quote was merely to convey that people tend to internalize certain behaviors in terms of moral principles, which explains the fact that people don’t immediately abandon their habits when the environment suddenly shifts. However, I was silent on the question of which selfishly useful behaviors get internalized this way and which ones don’t.
A good starting hypothesis is that people internalize certain behaviors in moral terms if they are taught to see those behaviors in moral terms. This ties into your theory that people “have an innate drive to notice, internalize, endorse, and take pride in following social norms”. We are not taught to see “reaching into your wallet and shredding a dollar” as impinging on moral principles, so people don’t tend to internalize the behavior that way. Yet, we are taught to see punching someone in the face as impinging on a moral principle. However, this hypothesis still leaves much to be explained, as it doesn’t tell us which behaviors we will tend to be taught about in moral terms, and which ones we won’t be taught in moral terms.
As a deeper, perhaps evolutionary explanation, I suspect that internalizing certain behaviors in moral terms helps make our commitments to other people more credible: if someone thinks you’re not going to steal from them because you think it’s genuinely wrong to steal, then they’re more likely to trust you with their stuff than if they think you merely recognize the practical utility of not stealing from them. This explanation hints at the idea that we will tend to internalize certain behaviors in moral terms if those behaviors are both selfishly relevant, and important for earning trust among other agents in the world. This is my best guess at what explains the rough outlines of human morality that we see in most societies.
I’m not sure what “largely” means here. I hope we can agree that our objectives are selfish in some ways and unselfish in other ways.
Parents generally like their children, above and beyond the fact that their children might give them yummy food and shelter in old age. People generally form friendships, and want their friends to not get tortured, above and beyond the fact that having their friends not get tortured could lead to more yummy food and shelter later on. Etc.
In that sentence, I meant “largely selfish” as a stand-in for what I think humans-by-default care overwhelmingly about, which is something like “themselves, their family, their friends, and their tribe, in rough descending order of importance”. The problem is that I am not aware of any word in the English language to describe people who have these desires, except perhaps the word “normal”.
The word selfish usually denotes someone who is preoccupied with their own feelings, and is unconcerned with anyone else. We both agree that humans are not entirely selfish. Nonetheless, the opposite word, altruistic, often denotes someone who is preoccupied with the general social good, and who cares about strangers, not merely their own family and friend circles. This is especially the case in philosophical discussions in which one defines altruism in terms of impartial benevolence to all sentient life, which is extremely far from an accurate description of the typical human.
Humans exist on a spectrum between these two extremes. We are not perfectly selfish, nor are we perfectly altruistic. However, we are generally closer to the ideal of perfect selfishness than to the ideal of perfect altruism, given the fact that our own family, friend group, and tribe tends to be only a small part of the entire world. This is why I used the language of “largely selfish” rather than something else.
Honest question: Suppose that my friends and other people whom I like and respect and trust all believe that genocide is very bad. I find myself (subconsciously) motivated to fit in with them, and I wind up adopting their belief that genocide is very bad. And then I take corresponding actions, by writing letters to politicians urging military intervention in Myanmar.
In your view, would that count as “selfish” because I “selfishly” benefit from ideologically fitting in with my friends and trusted leaders? Or would it count as “altruistic” because I am now moved by the suffering of some ethnic group across the world that I’ve never met and can’t even pronounce?
It is not always an expression of selfish motives when people take a stance against genocide. I would even go as far as saying that, in the majority of cases, people genuinely have non-selfish motives when taking that position. That is, they actually do care, to at least some degree, about the genocide, beyond the fact that signaling their concern helps them fit in with their friend group.
Nonetheless, and this is important: few people are willing to pay substantial selfish costs in order to prevent genocides that are socially distant from them.
The theory I am advancing here does not rest on the idea that people aren’t genuine in their desire for faraway strangers to be better off. Rather, my theory is that people generally care little about such strangers, when helping those strangers trades off significantly against objectives that are closer to themselves, their family, friend group, and their own tribe.
Or, put another way, distant strangers usually get little weight in our utility function. Our family, and our own happiness, by contrast, usually get a much larger weight.
The core element of my theory concerns the amount that people care about themselves (and their family, friends, and tribe) versus other people, not whether they care about other people at all.
Hmm. I think you’re understating the tendency of most people to follow prevailing norms, and yet your main conclusion is partly right. I think there are interesting dynamics happening at two levels simultaneously—the level of individual decisions, and the level of cultural evolution—and your comment is kinda conflating those levels.
So here’s how I would put things:
Most people care very very strongly about doing things that would look good in the eyes of the people they respect. They don’t think of it that way, though—it doesn’t feel like that’s what they’re doing, and indeed they would be offended by that suggestion. Instead, those things just feel like the right and appropriate things to do. This is related to and upstream of norm-following. This is an innate drive, part of human nature built into our brain by evolution.
Also, most people also have various other innate drives that lead to them feeling motivated to eat when hungry, to avoid pain, to bond with friends, for parents to love their children and adolescents to disrespect their parents (but respect their slightly-older friends), and much else.
(But there’s person-to-person variation, and in particular some small fraction of people are sociopaths who just don’t feel intrinsically motivated by (1) at all.)
The norms of (1) can be totally arbitrary. If the people I respect think that genocide is bad, then probably so do I. If they think genocide is awesome, then probably so do I. If they think it’s super-cool to hop backwards on one foot, then probably so do I.
…But (2) provides a constant force gently pushing norms towards behavioral patterns that match up with innate tendencies in (2). So we tend to wind up with cultural norms that line up with avoiding pain, eating-when-hungry, bonding with friends, and so on.
…But not perfectly, because there are other forces acting on norms too, such as game-theoretic signaling equilibria or whatever. These enable the existence of widespread norms with aspects that run counter to aspects of (2)—think of religious fasting, initiation rites, etc.
When (4),(5),(6) play out in some group or society, some norms will “win” over others, and the norms that “win” are probably (to some extent) a priori predictable from structural aspects of the situation—homogeneity, mobility, technology, whatever.
I wonder which of these cases this comment of yours is:
consider “seeing someone get unexpectedly punched hard in the stomach”. That makes me cringe a bit, still, even as an adult.
One thing is, I think the brain invests like 10,000× more neurons into figuring out whether a thought is good vs bad (positive vs negative valence) as figuring out whether a thought is or is not a good time to cringe. So I think the valence calculation can capture subtleties and complexities that the simpler cringe calculation can’t. This especially includes things properly handling complex thoughts with subordinate clauses and so on. For example, in the thought “I’ll do X in order to avoid Y”, the more negative the valence of Y is, the more positive the valence of the whole thought is. So the hypothesis “our brains are unable to learn a strong valence-difference between two vaguely-related situations” is (even?) more implausible than the hypothesis “our brains are unable to learn a strong stomach-cringe-appropriateness-difference between two vaguely-related situations”.
Another thing is, I obviously do think there are specific evolved mechanisms at play here, even if I didn’t talk about them in this post.
Another thing is, occasionally lightly tensing my stomach, in situations where I don’t need to, just isn’t the kind of high-stakes mistake that warrants a strong update in any brain learning algorithm. Like, if some flash in the corner of your eye has a 2% chance of preceding getting hit in the stomach, it’s still the right move to cringe every time—I’m happy to trade 50 false positives where I tense my stomach unnecessarily, in exchange for 1 true positive where I protect myself from serious injury. So presumably the brain learning algorithm is tuned to update only very weakly on false positives. Now, I don’t normally see people get punched in the stomach, up close and personal. I can’t even remember the last time that happened. If I saw that every day, I might well get desensitized to it. I do seem to be pretty well desensitized to seeing people get punched on TV.
I think the steelmaned version of beren’s argument is
That you indeed get for free. It will not get you far, as you have pointed out, because once you get more information, the model will learn to distinguish the cases precisely. And we know from observation that some mammals (specifically territorial ones) and most other animals do not show general empathy.
But there are multiple ways that empathy can be implemented with small additional circuitry. I think this is the part of beren’s comment that you were referring to:
But it might even be possible that no additional circuitry is required if the environment is just right. Consider the case of a very social animal in an environment where individuals, esp. young ones, rarely can take care of themselves alone. In such an environment, there may be many situations where the well-being of others predicts your own well-being. For example, if you give something to the other (and that might just be smile) that makes it more likely to be fed. This doesn’t seem to necessarily require any extra circuits, though it might be more likely to bootstrap off some prior mechanisms, e.g., grooming or infant care.
This might not be stable because free-loading might evolve, but this is then secondary.
I wonder which of these cases this comment of yours is:
I don’t really buy this. For my whole childhood, I was in an environment where it was illegal, dangerous, and taboo for me to drive a car (because I was underage). And then I got old enough to drive, and so of course I started doing so without a second thought. I had not permanently internalized the idea that “Steve driving a car” is bad. Instead, I got older, my situation changed, and my behavior changed accordingly. Likewise, I dropped tons of other habits of childhood—my religious practices, my street address, my bedtime, my hobbies, my political beliefs, my values, etc.—as soon as I got older and my situation changed.
So by the same token, when I was a little kid, yes it was in my self-interest (to some extent) for my parents to be healthy and happy. But that stopped being true as soon as I was financially independent. Why assume that people would permanently internalize that, when they fail to permanently internalize so many other aspects of childhood?
Actually it’s worse than that—adolescents are notorious for not feeling motivated by the well-being of their parents, even while such well-being is still in their own narrow self-interest!! :-P
(And generalizing across people seems equally implausible to generalizing across time. I called my parents “mom and dad”, but I didn’t generalize that to calling everyone I met “mom and dad”. So why assume that my brain would generalize being-nice-to-parents to being-nice-to-everyone?)
It’s true that sometimes childhood incentives lead to habits that last through adulthood, but I think that mainly happens via (1) the adult independently assesses those habits as being more appealing than alternatives, or (2) the adult continues the habits because it’s never really occurred to them that there was any other option.
As an example of (2), a religious person raised in a religious community might stay religious by default. Until, that is, they move to the big city, where they have atheist roommates and coworkers and friends. And at that point, they’ll probably at least imagine the possibility of becoming atheist. And they might or might not find that possibility appealing, based on their personality and so on.
But (2) doesn’t particularly apply to the idea of being selfish. I don’t think people are nice because it’s never even crossed their mind, not even once in their whole life, that maybe they could not do a nice thing. That’s a very obvious and salient idea! :)
[More on this in Heritability, Behaviorism, and Within-Lifetime RL :) ]
I think the point we agree on is
I think that the habit of being nice to people is empathy.
I’m not claiming that they “permanently internalize” but that they correctly (well, modulo mistakes) predict that it is their interests. You started driving a car because you correctly predicted that the situation/environment had changed. But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.
That depends on the type of well-being and your ability to predict it. And maybe other priorities get in the way during that age. And again, I’m not claiming unconditional goodness. The environment of young adults is clearly different from that of children, but it is comparable enough to predict positive value from being nice to your parents.
Actually, psychopaths prove this point: The anti-social behavior is “learned” in many cases during abusive childhood experiences, i.e., in environments where it was exactly not in their interest to be nice—because it didn’t benefit them. And on the other side, psychopaths can, in many cases, function and show prosocial behaviors in stable environments with strong social feedback.
This also generalizes to the cultures example.
I agree: In the city, many of their previous predictions of which behaviors exactly lead to positive feedback (“quoting the Bible”) might be off and they will quickly learn new behaviors. But being nice to people in general, will still work. In fact, I claim, it tends to generalize even more, which is why people who have been around more varied communities tend to develop more generalized morality (higher Kegan levels).
I’m not too sure what you’re arguing.
I think we agree that motivations need to ground out directly or indirectly with “primary rewards” from innate drives (pain is bad, eating-when-hungry is good, etc., other things equal). (Right?)
And then your comment kinda sounds like you’re making the following argument:
If that’s what you’re trying to say, then I strongly disagree and I’m happy to chat about that … but I was under quite a strong impression that that’s not what you believe! Right?
I thought that you believed that there is a primary reward / innate drive that makes it feel intrinsically rewarding for adults to be nice (under certain circumstances); if so, why bring up childhood at all?
Sorry if I’m confused :)
I do think that there are mechanisms in the human brain that make prosocial behavior more intrinsically rewarding, such as the mechanisms you pointed out in the Valence sequence.
But I also notice that in the right kind of environments, “being nice to people” may predict “people being nice to you” (in a primary reward sense) to a higher degree than might be intuitive.
I don’t think that’s enough because you still need to ensure that the environment is sufficiently likely to begin with, with mechanisms such as rewarding smiles, touch inclinations, infant care instincts or whatever.
I think this story of how human empathy works may plausibly involve both social instincts as well as the self-interested indirect reward in very social environments.
OK, my theory is:
(A) There’s a thing where people act kind towards other people because it’s in their self-interest to act kind—acting kind will ultimately lead to eating yummier food, avoiding pain, and so on. In everyday life, we tend to associate this with flattery, sucking up, deception, insincerity, etc., and we view it with great skepticism, because we recognize (correctly) that such a person will act kind but then turn right around and stab you in the back as soon as the situation changes.
(B) There’s a separate thing where people act kind towards other people because there’s some innate drive / primary reward / social instinct closely related to acting kind towards other people, e.g. feeling that the other person’s happiness is its own intrinsic reward. In everyday life, we view this thing very positively, because we recognize that such a person won’t stab you in the back when the situation changes.
I keep trying to pattern-match what you’re saying to:
(C) [which I don’t believe in] This is a third category of situations where people are kind. Like (A), it ultimately stems from self-interest. But like (B), it does not entail the person stabbing you in the back as soon as the situation changes (such that stabbing you in the back is in their self-interest). And the way that works is over-generalization. In this story, the person finds that it’s in their self-interest to act kind, and over-generalizes this habit to act kind even in situations where it’s not in their self-interest.
And then I was saying that that kind of over-generalization story proves too much, because it would suggest that I would retain my childhood habit of not-driving-cars, and my childhood habit of saying that my street address is 18 Main St., etc. And likewise, it would say that I would continue to wear winter coats when I travel to the tropics, and that if somebody puts a toy train on my plate at lunchtime I would just go right ahead and eat it, etc. We adults are not so stupid as to over-generalize like that. We learn to adapt our behavior to the situation, and to anticipate relevant consequences.
But maybe that’s not what you’re arguing? I’m still kinda confused. You wrote “But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.” I want to translate that as: “All this talk of stabbing people in the back is irrelevant, because there is practically never a situation where it’s in somebody’s self-interest to act unkind and stab someone in the back. So (A) is really just fine!” I don’t think you’d endorse that, right? But it is a possible position—I tend to associate it with @Matthew Barnett. I agree that we should all keep in mind that it’s very possible for people to act kind for self-interested reasons. But I strongly don’t believe that (A) is sufficient for Safe & Beneficial AGI. But I think that you’re already in agreement with me about that, right?
Without carefully reading the above comment chain (forgive me if I need to understand the full discussion here before replying), I would like to clarify what my views are on this particular question, since I was referenced. I think that:
It is possible to construct a stable social and legal environment in which it is in the selfish interests of almost everyone to act in such a way that brings about socially beneficial outcomes. A good example of such an environment is one where theft is illegal and in order to earn money, you have to get a job. This naturally incentivizes people to earn a living by helping others rather than stealing from others, which raises social welfare.
It is not guaranteed that the existing environment will be such that self-interest is aligned with the general public interest. For example, if we make shoplifting de facto legal by never penalizing people who do it, this would impose large social costs on society.
Our current environment has a mix of both of these good and bad features. However, on the whole, in modern prosperous societies during peacetime, it is generally in one’s selfish interest to do things that help rather than hurt other people. This means that, even for psychopaths, it doesn’t usually make selfish sense to go around hurting other people.
Over time, in societies with well-functioning social and legal systems, most people learn that hurting other people doesn’t actually help them selfishly. This causes them to adopt a general presumption against committing violence, theft, and other anti-social acts themselves, as a general principle. This general principle seems to be internalized in most people’s minds as not merely “it is not in your selfish interest to hurt other people” but rather “it is morally wrong to hurt other people”. In other words, people internalize their presumption as a moral principle, rather than as a purely practical principle. This is what prevents people from stabbing each other in the backs immediately once the environment changes.
However, under different environmental conditions, given enough time, people will internalize different moral principles. For example, in an environment in which slaughtering animals becomes illegal and taboo, most people would probably end up internalizing the moral principle that it’s wrong to hurt animals. Under our current environment, very few people internalize this moral principle, but that’s mainly because slaughtering animals is currently legal, and widely accepted.
This all implies that, in an important sense, human morality is not really “in our DNA”, so to speak. Instead, we internalize certain moral principles because those moral principles encode facts about what type of conduct happens to be useful in the real world for achieving our largely selfish objectives. Whenever the environment shifts, so too does human morality. This distinguishes my view from the view that humans are “naturally good” or have empathy-by-default.
Which is not to say that there isn’t some sense in which human morality comes from human DNA. The causal mechanisms here are complicated. People vary in their capacity for empathy and the degree to which they internalize moral principles. However, I think in most contexts, it is more appropriate to look at people’s environment as the determining factor of what morality they end up adopting, rather than thinking about what their genes are.
Sorry for oversimplifying your views, thanks for clarifying. :)
Here’s a part I especially disagree with:
Just to be clear, I imagine we’ll both agree that if some behavior is always a good idea, it can turn into an unthinking habit. For example, today I didn’t take all the cash out of my wallet and shred it—not because I considered that idea and decided that it’s a bad idea, but rather because it never crossed my mind to do that in the first place. Ditto with my (non)-decision to not plan a coup this morning. But that’s very fragile (it relies on ideas not crossing my mind), and different from what you’re talking about.
My belief is: Neurotypical people have an innate drive to notice, internalize, endorse, and take pride in following social norms, especially behaviors that they imagine would impress the people whom they like and admire in turn. (And I have ideas about how this works in the brain! I think it’s mainly related to what I call the “drive to be liked / admired”, general discussion here, more neuroscience details coming soon I hope.)
The object-level content of these norms is different in different cultures and subcultures and times, for sure. But the special way that we relate to these norms has an innate aspect; it’s not just a logical consequence of existing and having goals etc. How do I know? Well, the hypothesis “if X is generally a good idea, then we’ll internalize X and consider not-X to be dreadfully wrong and condemnable” is easily falsified by considering any other aspect of life that doesn’t involve what other people will think of you. It’s usually a good idea to wear shoes that are comfortable, rather than too small. It’s usually a good idea to use a bookmark instead of losing your place every time you put your book down. It’s usually a good idea to sleep on your bed instead of on the floor next to it. Etc. But we just think of all those things as good ideas, not moral rules; and relatedly, if the situation changes such that those things become bad ideas after all for whatever reason, we’ll immediately stop doing them with no hesitation. (If this particular book is too fragile for me to use a bookmark, then that’s fine, I won’t use a bookmark, no worries!)
I’m not sure what “largely” means here. I hope we can agree that our objectives are selfish in some ways and unselfish in other ways.
Parents generally like their children, above and beyond the fact that their children might give them yummy food and shelter in old age. People generally form friendships, and want their friends to not get tortured, above and beyond the fact that having their friends not get tortured could lead to more yummy food and shelter later on. Etc. I do really think both of those examples centrally involve evolved innate drives. If we have innate drives to eat yummy food and avoid pain, why can’t we also have innate drives to care for children? Mice have innate drives to care for children—it’s really obvious, there are particular hormones and stereotyped cell groups in their hypothalamus and so on. Why not suppose that humans have such innate drives too? Likewise, mice have innate drives related to enjoying the company of conspecifics and conversely getting lonely without such company. Why not suppose that humans have such innate drives too?
To be clear, I didn’t mean to propose the specific mechanism of: if some behavior has a selfish consequence, then people will internalize that class of behaviors in moral terms rather than in purely practical terms. In other words, I am not saying that all relevant behaviors get internalized this way. I agree that only some behaviors are internalized by people in moral terms, and other behaviors do not get internalized in terms of moral principles in the way I described.
Admittedly, my statement was imprecise, but my intention in that quote was merely to convey that people tend to internalize certain behaviors in terms of moral principles, which explains the fact that people don’t immediately abandon their habits when the environment suddenly shifts. However, I was silent on the question of which selfishly useful behaviors get internalized this way and which ones don’t.
A good starting hypothesis is that people internalize certain behaviors in moral terms if they are taught to see those behaviors in moral terms. This ties into your theory that people “have an innate drive to notice, internalize, endorse, and take pride in following social norms”. We are not taught to see “reaching into your wallet and shredding a dollar” as impinging on moral principles, so people don’t tend to internalize the behavior that way. Yet, we are taught to see punching someone in the face as impinging on a moral principle. However, this hypothesis still leaves much to be explained, as it doesn’t tell us which behaviors we will tend to be taught about in moral terms, and which ones we won’t be taught in moral terms.
As a deeper, perhaps evolutionary explanation, I suspect that internalizing certain behaviors in moral terms helps make our commitments to other people more credible: if someone thinks you’re not going to steal from them because you think it’s genuinely wrong to steal, then they’re more likely to trust you with their stuff than if they think you merely recognize the practical utility of not stealing from them. This explanation hints at the idea that we will tend to internalize certain behaviors in moral terms if those behaviors are both selfishly relevant, and important for earning trust among other agents in the world. This is my best guess at what explains the rough outlines of human morality that we see in most societies.
In that sentence, I meant “largely selfish” as a stand-in for what I think humans-by-default care overwhelmingly about, which is something like “themselves, their family, their friends, and their tribe, in rough descending order of importance”. The problem is that I am not aware of any word in the English language to describe people who have these desires, except perhaps the word “normal”.
The word selfish usually denotes someone who is preoccupied with their own feelings, and is unconcerned with anyone else. We both agree that humans are not entirely selfish. Nonetheless, the opposite word, altruistic, often denotes someone who is preoccupied with the general social good, and who cares about strangers, not merely their own family and friend circles. This is especially the case in philosophical discussions in which one defines altruism in terms of impartial benevolence to all sentient life, which is extremely far from an accurate description of the typical human.
Humans exist on a spectrum between these two extremes. We are not perfectly selfish, nor are we perfectly altruistic. However, we are generally closer to the ideal of perfect selfishness than to the ideal of perfect altruism, given the fact that our own family, friend group, and tribe tends to be only a small part of the entire world. This is why I used the language of “largely selfish” rather than something else.
Honest question: Suppose that my friends and other people whom I like and respect and trust all believe that genocide is very bad. I find myself (subconsciously) motivated to fit in with them, and I wind up adopting their belief that genocide is very bad. And then I take corresponding actions, by writing letters to politicians urging military intervention in Myanmar.
In your view, would that count as “selfish” because I “selfishly” benefit from ideologically fitting in with my friends and trusted leaders? Or would it count as “altruistic” because I am now moved by the suffering of some ethnic group across the world that I’ve never met and can’t even pronounce?
It is not always an expression of selfish motives when people take a stance against genocide. I would even go as far as saying that, in the majority of cases, people genuinely have non-selfish motives when taking that position. That is, they actually do care, to at least some degree, about the genocide, beyond the fact that signaling their concern helps them fit in with their friend group.
Nonetheless, and this is important: few people are willing to pay substantial selfish costs in order to prevent genocides that are socially distant from them.
The theory I am advancing here does not rest on the idea that people aren’t genuine in their desire for faraway strangers to be better off. Rather, my theory is that people generally care little about such strangers, when helping those strangers trades off significantly against objectives that are closer to themselves, their family, friend group, and their own tribe.
Or, put another way, distant strangers usually get little weight in our utility function. Our family, and our own happiness, by contrast, usually get a much larger weight.
The core element of my theory concerns the amount that people care about themselves (and their family, friends, and tribe) versus other people, not whether they care about other people at all.
Hmm. I think you’re understating the tendency of most people to follow prevailing norms, and yet your main conclusion is partly right. I think there are interesting dynamics happening at two levels simultaneously—the level of individual decisions, and the level of cultural evolution—and your comment is kinda conflating those levels.
So here’s how I would put things:
Most people care very very strongly about doing things that would look good in the eyes of the people they respect. They don’t think of it that way, though—it doesn’t feel like that’s what they’re doing, and indeed they would be offended by that suggestion. Instead, those things just feel like the right and appropriate things to do. This is related to and upstream of norm-following. This is an innate drive, part of human nature built into our brain by evolution.
Also, most people also have various other innate drives that lead to them feeling motivated to eat when hungry, to avoid pain, to bond with friends, for parents to love their children and adolescents to disrespect their parents (but respect their slightly-older friends), and much else.
(But there’s person-to-person variation, and in particular some small fraction of people are sociopaths who just don’t feel intrinsically motivated by (1) at all.)
The norms of (1) can be totally arbitrary. If the people I respect think that genocide is bad, then probably so do I. If they think genocide is awesome, then probably so do I. If they think it’s super-cool to hop backwards on one foot, then probably so do I.
…But (2) provides a constant force gently pushing norms towards behavioral patterns that match up with innate tendencies in (2). So we tend to wind up with cultural norms that line up with avoiding pain, eating-when-hungry, bonding with friends, and so on.
…But not perfectly, because there are other forces acting on norms too, such as game-theoretic signaling equilibria or whatever. These enable the existence of widespread norms with aspects that run counter to aspects of (2)—think of religious fasting, initiation rites, etc.
When (4),(5),(6) play out in some group or society, some norms will “win” over others, and the norms that “win” are probably (to some extent) a priori predictable from structural aspects of the situation—homogeneity, mobility, technology, whatever.
One thing is, I think the brain invests like 10,000× more neurons into figuring out whether a thought is good vs bad (positive vs negative valence) as figuring out whether a thought is or is not a good time to cringe. So I think the valence calculation can capture subtleties and complexities that the simpler cringe calculation can’t. This especially includes things properly handling complex thoughts with subordinate clauses and so on. For example, in the thought “I’ll do X in order to avoid Y”, the more negative the valence of Y is, the more positive the valence of the whole thought is. So the hypothesis “our brains are unable to learn a strong valence-difference between two vaguely-related situations” is (even?) more implausible than the hypothesis “our brains are unable to learn a strong stomach-cringe-appropriateness-difference between two vaguely-related situations”.
Another thing is, I obviously do think there are specific evolved mechanisms at play here, even if I didn’t talk about them in this post.
Another thing is, occasionally lightly tensing my stomach, in situations where I don’t need to, just isn’t the kind of high-stakes mistake that warrants a strong update in any brain learning algorithm. Like, if some flash in the corner of your eye has a 2% chance of preceding getting hit in the stomach, it’s still the right move to cringe every time—I’m happy to trade 50 false positives where I tense my stomach unnecessarily, in exchange for 1 true positive where I protect myself from serious injury. So presumably the brain learning algorithm is tuned to update only very weakly on false positives. Now, I don’t normally see people get punched in the stomach, up close and personal. I can’t even remember the last time that happened. If I saw that every day, I might well get desensitized to it. I do seem to be pretty well desensitized to seeing people get punched on TV.