Our emotions are in some sense the human equivalent of “utility functions”.
We don’t hate the suffering of other people in some abstract way—we hate the suffering of other people because it causes us pain to think about other people suffering. We love truth because of that rush of satisfaction upon hitting upon it.
Yes, we intrinsically prefer pleasure over pain, but that’s only part of the story. We also prefer the causes of satisfaction to happen, beyond preferring the feeling of satisfaction itself. We hate the causes of pain beyond the extent to which we hate the actual feeling of pain itself.
You can’t really replace the more abstract negative affects with a warning signal, because the negative affect was the reason you hated, say, deception, in the first place. Replacing negative affect in response to deception would be akin to removing part of the preference against deception.
That’s why sociopaths don’t care about people. They don’t feel guilt. You could tell them “this is where you would ordinarily feel guilty, if we hadn’t removed your negative affect associated with hurting people” but they aren’t going to care about the warning signal. Maybe some past version of themselves who hadn’t had negative affect removed might have cared, but they will not.
Negative affect is the switch that tells the brain “don’t do things that cause that’. Removing negative affect would actually remove the perception of negative utility. For simple bodily pain, who cares...but you’re going to start altering values if you mess with any of the more abstract stuff.
So, when we radically alter our emotions, don’t we also radically alter our “utility functions”? I’d like future-me’s interests to generally align with current-me’s coherent extrapolated interests.
we hate the suffering of other people because it causes us pain to think about other people suffering.
It seems to me that I negatively value other people’s suffering… I want there to be less of it.
Given the choice between reducing their suffering and reducing the pain I feel upon contemplating their suffering, it seems to me I ought to reduce their suffering.
Given the option of reducing their suffering at the cost of experiencing just as much pain when I contemplate their lack of suffering as I do now when I contemplate their suffering, it seems to me I ought to reduce their suffering.
None of that seems compatible with the idea that what I actually negatively value is the pain of thinking about other people suffering.
What I can’t figure out is whether you’re suggesting that I’m ethically confused… that it simply isn’t true that I ought to do those things, and if I understood the world better it would stop seeming to me that I ought to do them… or if I’m simply not being correctly described by your “we” statements and you’re unjustifiedly generalizing from your own experience… or whether perhaps I’ve altogether misunderstood you.
What I can’t figure out is whether you’re suggesting that I’m ethically confused… that it simply isn’t true that I ought to do those things, and if I understood the world better it would stop seeming to me that I ought to do them… or if I’m simply not being correctly described by your “we” statements and you’re unjustifiedly generalizing from your own experience
None of the above. I’m just trying to figure out why my intuition says that I do not want not block all negative affect and whether my intuition is wrong, and your objections are helping me to so. I’ve got no idea whether we’re fundamentally different, or whether one of us is wrong—I’m just verbally playing with the space of ideas with you. The things I’m saying right now are exploratory thoughts and could easily be wrong—the hope is that value comes out of it.
“We” is just a placeholder for humans. I’m making the philosophical claim that negative affect is the real-life, non-theoretical thing that corresponds to the game-theory construct of negative utility, with some small connotative differences.
None of that seems compatible with the idea that what I actually negatively value is the pain of thinking about other people suffering.
No, of course not. Here’s what I’m suggesting: Thinking about other people’s suffering causes the emotion “concern” (a negative emotion) which is in fact “negative utility”. If you don’t feel concern when faced with the knowledge that someone is in pain, it means that you don’t experience “negative utility” in response to other people being in pain. I’m suggesting the fact that you negatively value people to be in pain is inextricably linked to the emotions you feel when people are in pain. I’m suggesting that If you remove concern (as occurs in real-world sociopathy) you won’t have any intrinsic incentive to care about the pain of others anymore.
(Not “you” in particular, but animals in general.)
Basically, when modelling a real world object as an agent, we should consider whatever mechanism causes the neural circuits (or whatever the being is made of) that cause it to take action as indicative of “utility”. In humans, the neural pattern “concern” causes us to take action when others suffer, so “concern” is negative utility in response to suffering. (This gets confusing when agents don’t act in their interests, but if we want to nitpick about things like that we shouldn’t be modelling objects as agents in the first place)
Here’s a question: Do you think we have moral responsibilities to AI? Is it immoral to cause a Friendly AI to experience negative utility by fooling it into thinking bad things are happening and then killing it? I think the answer might be yes—since the FAI shares many human values, I think I consider it a person. It makes sense to treat negative utility for the FAI as analogous to human negative affect.
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
But maybe we can work around this...maybe we can get a being which experiences positive affect from preventing torture, rather than negative affect from not preventing torture. Such a being has an incentive to prevent torture, yet doesn’t feel concerned when torture happens.
Either way though—if this line of thought makes sense, you can’t have a human which is constantly experiencing maximum positive affect, because that human would never have an incentive to act at all.
A rational agent makes decisions by imagining a space of hypothetical universes and picking the one it prefers using its actions. How should I choose my favorite out of these hypothetical universes? It seems to involve simulating the affective states that I would feel in each universe. But this model breaks down if I put my own brain in these universes, because then I will just pick the universe that maximize my own affective states. I’ve got to treat my brain as a black box. Once you start tinkering with the brain, decision theory goes all funny.
Edit: Affective states don’t have to roughly correspond to utility. If you’re a human, positive utility is “good”. you’re a paperclipper, positive utility is “paperclippy”. It’s just that human utility is affective states.
If you alter the affective states, you will alter behavior (and therefore you alter “utility”). This does not mean that the affective state is the thing which you value—it means that for humans the affective state is the hardware that decides what you value.
(again, not you per se. I should probably get out of the habit of using “you”).
Thinking about other people’s suffering causes the emotion “concern” (a negative emotion) which is in fact “negative utility”.
I agree with this, in general.
If you don’t feel concern when faced with the knowledge that someone is in pain, it means that you don’t experience “negative utility”
This suggests not only that concern implies negative utility, but that only concern implies negative utility and nothing else (or at least nothing relevant) does. Do you mean to suggest that? If so, I disagree utterly. If not, and you’re just restricting the arena of discourse to utility-based-on-concern rather than utility-in-general, then OK… within that restricted context, I agree.
That said, I’m pretty sure you meant the former, and I disagree.
Do you think we have moral responsibilities to AI? Is it immoral to cause a Friendly AI to experience negative utility by fooling it into thinking bad things are happening and then killing it?
Maybe, but not necessarily. It depends on the specifics of the AI.
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
Yes, that follows. I think both claims are false.
you can’t have a human which is constantly experiencing maximum positive affect, because that human would never have an incentive to act at all.
I agree that in human minds, differential affect motivates action; if we eliminate all variation in affect we eliminate that motive for action, which either requires that we find another motivation for action, or (as you suggest) we eliminate all incentives for action.
Are there other motivations? Are there situations under which the lack of such incentives is acceptable?
If not, and you’re just restricting the arena of discourse to utility-based-on-concern rather than utility-in-general, then OK… within that restricted context, I agree.
yes...we agree
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
Shit I’m in a contradiction. Okay, I’ve messed up by using “affect” under multiple definitions, my mistake.
Reformatting...
1) There are many mechanisms for creating beings that can be modeled as agents with utility
2) Let us define Affect as the mechanism that defines utility in humans—aka emotion.
So now....
3) Do moral considerations apply to all affect, or all things that approximate utility?
if we meet aliens, what do we judge them by?
They aren’t going to be made out of neurons. Our definitions of “emotion” are probably not going to apply. But they might be like us—they might cooperate among themselves and they might cooperate with us. We might feel empathy for them. A moral system which disregards the preferences of beings simply because affect is not involved in implementing their minds seems to not match my moral system. I’d want to be able to treat aliens well.
I have a dream that all beings that can be approximated as agents will be judged by their actions, and not any trivial specifics of how their algorithm is implemented.
I’d feel some empathy for a FAI too. Even it it doesn’t have emotions, it understands them. It’s utility function puts it in the class of beings I’d call “good”. My social instincts seem to apply to it—I’m friendly to it the same way it is friendly to me.
So, what I’m saying is that “affect’ and “utility” are morally equivalent. Even though there are multiple paths to utility they all carry similar moral weight.
If you remove “concern” and replace it with a signal that has the same result on actions as concern, then maybe “concern” and the signal are morally equivalent.
Do you further agree that it follows from this that there is some hard limit to which it makes sense to self-modify to avoid certain negative emotions?
(We can replace the negative emotions with other processes that have the same behavioral effect, but making someone undergo said other processes would be morally equivalent to making them undergo a negative emotion, so there isn’t a point in doing so)
Do you further agree that it follows from this that there is some hard limit to which it makes sense to self-modify to avoid certain negative emotions?
I don’t agree that it follows, no, though I do agree that there’s probably some threshold above which losing the ability to experience the emotions we currently experience leaves us worse off.
I also don’t agree that eliminating an emotion while adding a new process that preserves certain effects of that emotion which I value is equivalent (morally or otherwise) to preserving the emotion. More generally, I don’t agree with your whole enterprise of equating emotions with utility shifts. They are different things.
More thought:
Our emotions are in some sense the human equivalent of “utility functions”.
We don’t hate the suffering of other people in some abstract way—we hate the suffering of other people because it causes us pain to think about other people suffering. We love truth because of that rush of satisfaction upon hitting upon it.
Yes, we intrinsically prefer pleasure over pain, but that’s only part of the story. We also prefer the causes of satisfaction to happen, beyond preferring the feeling of satisfaction itself. We hate the causes of pain beyond the extent to which we hate the actual feeling of pain itself.
You can’t really replace the more abstract negative affects with a warning signal, because the negative affect was the reason you hated, say, deception, in the first place. Replacing negative affect in response to deception would be akin to removing part of the preference against deception.
That’s why sociopaths don’t care about people. They don’t feel guilt. You could tell them “this is where you would ordinarily feel guilty, if we hadn’t removed your negative affect associated with hurting people” but they aren’t going to care about the warning signal. Maybe some past version of themselves who hadn’t had negative affect removed might have cared, but they will not.
Negative affect is the switch that tells the brain “don’t do things that cause that’. Removing negative affect would actually remove the perception of negative utility. For simple bodily pain, who cares...but you’re going to start altering values if you mess with any of the more abstract stuff.
So, when we radically alter our emotions, don’t we also radically alter our “utility functions”? I’d like future-me’s interests to generally align with current-me’s coherent extrapolated interests.
It seems to me that I negatively value other people’s suffering… I want there to be less of it.
Given the choice between reducing their suffering and reducing the pain I feel upon contemplating their suffering, it seems to me I ought to reduce their suffering.
Given the option of reducing their suffering at the cost of experiencing just as much pain when I contemplate their lack of suffering as I do now when I contemplate their suffering, it seems to me I ought to reduce their suffering.
None of that seems compatible with the idea that what I actually negatively value is the pain of thinking about other people suffering.
What I can’t figure out is whether you’re suggesting that I’m ethically confused… that it simply isn’t true that I ought to do those things, and if I understood the world better it would stop seeming to me that I ought to do them… or if I’m simply not being correctly described by your “we” statements and you’re unjustifiedly generalizing from your own experience… or whether perhaps I’ve altogether misunderstood you.
None of the above. I’m just trying to figure out why my intuition says that I do not want not block all negative affect and whether my intuition is wrong, and your objections are helping me to so. I’ve got no idea whether we’re fundamentally different, or whether one of us is wrong—I’m just verbally playing with the space of ideas with you. The things I’m saying right now are exploratory thoughts and could easily be wrong—the hope is that value comes out of it.
“We” is just a placeholder for humans. I’m making the philosophical claim that negative affect is the real-life, non-theoretical thing that corresponds to the game-theory construct of negative utility, with some small connotative differences.
No, of course not. Here’s what I’m suggesting: Thinking about other people’s suffering causes the emotion “concern” (a negative emotion) which is in fact “negative utility”. If you don’t feel concern when faced with the knowledge that someone is in pain, it means that you don’t experience “negative utility” in response to other people being in pain. I’m suggesting the fact that you negatively value people to be in pain is inextricably linked to the emotions you feel when people are in pain. I’m suggesting that If you remove concern (as occurs in real-world sociopathy) you won’t have any intrinsic incentive to care about the pain of others anymore.
(Not “you” in particular, but animals in general.)
Basically, when modelling a real world object as an agent, we should consider whatever mechanism causes the neural circuits (or whatever the being is made of) that cause it to take action as indicative of “utility”. In humans, the neural pattern “concern” causes us to take action when others suffer, so “concern” is negative utility in response to suffering. (This gets confusing when agents don’t act in their interests, but if we want to nitpick about things like that we shouldn’t be modelling objects as agents in the first place)
Here’s a question: Do you think we have moral responsibilities to AI? Is it immoral to cause a Friendly AI to experience negative utility by fooling it into thinking bad things are happening and then killing it? I think the answer might be yes—since the FAI shares many human values, I think I consider it a person. It makes sense to treat negative utility for the FAI as analogous to human negative affect.
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
But maybe we can work around this...maybe we can get a being which experiences positive affect from preventing torture, rather than negative affect from not preventing torture. Such a being has an incentive to prevent torture, yet doesn’t feel concerned when torture happens.
Either way though—if this line of thought makes sense, you can’t have a human which is constantly experiencing maximum positive affect, because that human would never have an incentive to act at all.
A rational agent makes decisions by imagining a space of hypothetical universes and picking the one it prefers using its actions. How should I choose my favorite out of these hypothetical universes? It seems to involve simulating the affective states that I would feel in each universe. But this model breaks down if I put my own brain in these universes, because then I will just pick the universe that maximize my own affective states. I’ve got to treat my brain as a black box. Once you start tinkering with the brain, decision theory goes all funny.
Edit: Affective states don’t have to roughly correspond to utility. If you’re a human, positive utility is “good”. you’re a paperclipper, positive utility is “paperclippy”. It’s just that human utility is affective states.
If you alter the affective states, you will alter behavior (and therefore you alter “utility”). This does not mean that the affective state is the thing which you value—it means that for humans the affective state is the hardware that decides what you value.
(again, not you per se. I should probably get out of the habit of using “you”).
I agree with this, in general.
This suggests not only that concern implies negative utility, but that only concern implies negative utility and nothing else (or at least nothing relevant) does. Do you mean to suggest that? If so, I disagree utterly. If not, and you’re just restricting the arena of discourse to utility-based-on-concern rather than utility-in-general, then OK… within that restricted context, I agree.
That said, I’m pretty sure you meant the former, and I disagree.
Maybe, but not necessarily. It depends on the specifics of the AI.
Yes, that follows. I think both claims are false.
I agree that in human minds, differential affect motivates action; if we eliminate all variation in affect we eliminate that motive for action, which either requires that we find another motivation for action, or (as you suggest) we eliminate all incentives for action.
Are there other motivations?
Are there situations under which the lack of such incentives is acceptable?
yes...we agree
Shit I’m in a contradiction. Okay, I’ve messed up by using “affect” under multiple definitions, my mistake.
Reformatting...
1) There are many mechanisms for creating beings that can be modeled as agents with utility 2) Let us define Affect as the mechanism that defines utility in humans—aka emotion.
So now....
3) Do moral considerations apply to all affect, or all things that approximate utility?
if we meet aliens, what do we judge them by?
They aren’t going to be made out of neurons. Our definitions of “emotion” are probably not going to apply. But they might be like us—they might cooperate among themselves and they might cooperate with us. We might feel empathy for them. A moral system which disregards the preferences of beings simply because affect is not involved in implementing their minds seems to not match my moral system. I’d want to be able to treat aliens well.
I have a dream that all beings that can be approximated as agents will be judged by their actions, and not any trivial specifics of how their algorithm is implemented.
I’d feel some empathy for a FAI too. Even it it doesn’t have emotions, it understands them. It’s utility function puts it in the class of beings I’d call “good”. My social instincts seem to apply to it—I’m friendly to it the same way it is friendly to me.
So, what I’m saying is that “affect’ and “utility” are morally equivalent. Even though there are multiple paths to utility they all carry similar moral weight.
If you remove “concern” and replace it with a signal that has the same result on actions as concern, then maybe “concern” and the signal are morally equivalent.
I agree that distinct processes that result in roughly equivalent utility shifts are roughly morally equivalent.
Do you further agree that it follows from this that there is some hard limit to which it makes sense to self-modify to avoid certain negative emotions?
(We can replace the negative emotions with other processes that have the same behavioral effect, but making someone undergo said other processes would be morally equivalent to making them undergo a negative emotion, so there isn’t a point in doing so)
I don’t agree that it follows, no, though I do agree that there’s probably some threshold above which losing the ability to experience the emotions we currently experience leaves us worse off.
I also don’t agree that eliminating an emotion while adding a new process that preserves certain effects of that emotion which I value is equivalent (morally or otherwise) to preserving the emotion. More generally, I don’t agree with your whole enterprise of equating emotions with utility shifts. They are different things.