What I can’t figure out is whether you’re suggesting that I’m ethically confused… that it simply isn’t true that I ought to do those things, and if I understood the world better it would stop seeming to me that I ought to do them… or if I’m simply not being correctly described by your “we” statements and you’re unjustifiedly generalizing from your own experience
None of the above. I’m just trying to figure out why my intuition says that I do not want not block all negative affect and whether my intuition is wrong, and your objections are helping me to so. I’ve got no idea whether we’re fundamentally different, or whether one of us is wrong—I’m just verbally playing with the space of ideas with you. The things I’m saying right now are exploratory thoughts and could easily be wrong—the hope is that value comes out of it.
“We” is just a placeholder for humans. I’m making the philosophical claim that negative affect is the real-life, non-theoretical thing that corresponds to the game-theory construct of negative utility, with some small connotative differences.
None of that seems compatible with the idea that what I actually negatively value is the pain of thinking about other people suffering.
No, of course not. Here’s what I’m suggesting: Thinking about other people’s suffering causes the emotion “concern” (a negative emotion) which is in fact “negative utility”. If you don’t feel concern when faced with the knowledge that someone is in pain, it means that you don’t experience “negative utility” in response to other people being in pain. I’m suggesting the fact that you negatively value people to be in pain is inextricably linked to the emotions you feel when people are in pain. I’m suggesting that If you remove concern (as occurs in real-world sociopathy) you won’t have any intrinsic incentive to care about the pain of others anymore.
(Not “you” in particular, but animals in general.)
Basically, when modelling a real world object as an agent, we should consider whatever mechanism causes the neural circuits (or whatever the being is made of) that cause it to take action as indicative of “utility”. In humans, the neural pattern “concern” causes us to take action when others suffer, so “concern” is negative utility in response to suffering. (This gets confusing when agents don’t act in their interests, but if we want to nitpick about things like that we shouldn’t be modelling objects as agents in the first place)
Here’s a question: Do you think we have moral responsibilities to AI? Is it immoral to cause a Friendly AI to experience negative utility by fooling it into thinking bad things are happening and then killing it? I think the answer might be yes—since the FAI shares many human values, I think I consider it a person. It makes sense to treat negative utility for the FAI as analogous to human negative affect.
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
But maybe we can work around this...maybe we can get a being which experiences positive affect from preventing torture, rather than negative affect from not preventing torture. Such a being has an incentive to prevent torture, yet doesn’t feel concerned when torture happens.
Either way though—if this line of thought makes sense, you can’t have a human which is constantly experiencing maximum positive affect, because that human would never have an incentive to act at all.
A rational agent makes decisions by imagining a space of hypothetical universes and picking the one it prefers using its actions. How should I choose my favorite out of these hypothetical universes? It seems to involve simulating the affective states that I would feel in each universe. But this model breaks down if I put my own brain in these universes, because then I will just pick the universe that maximize my own affective states. I’ve got to treat my brain as a black box. Once you start tinkering with the brain, decision theory goes all funny.
Edit: Affective states don’t have to roughly correspond to utility. If you’re a human, positive utility is “good”. you’re a paperclipper, positive utility is “paperclippy”. It’s just that human utility is affective states.
If you alter the affective states, you will alter behavior (and therefore you alter “utility”). This does not mean that the affective state is the thing which you value—it means that for humans the affective state is the hardware that decides what you value.
(again, not you per se. I should probably get out of the habit of using “you”).
Thinking about other people’s suffering causes the emotion “concern” (a negative emotion) which is in fact “negative utility”.
I agree with this, in general.
If you don’t feel concern when faced with the knowledge that someone is in pain, it means that you don’t experience “negative utility”
This suggests not only that concern implies negative utility, but that only concern implies negative utility and nothing else (or at least nothing relevant) does. Do you mean to suggest that? If so, I disagree utterly. If not, and you’re just restricting the arena of discourse to utility-based-on-concern rather than utility-in-general, then OK… within that restricted context, I agree.
That said, I’m pretty sure you meant the former, and I disagree.
Do you think we have moral responsibilities to AI? Is it immoral to cause a Friendly AI to experience negative utility by fooling it into thinking bad things are happening and then killing it?
Maybe, but not necessarily. It depends on the specifics of the AI.
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
Yes, that follows. I think both claims are false.
you can’t have a human which is constantly experiencing maximum positive affect, because that human would never have an incentive to act at all.
I agree that in human minds, differential affect motivates action; if we eliminate all variation in affect we eliminate that motive for action, which either requires that we find another motivation for action, or (as you suggest) we eliminate all incentives for action.
Are there other motivations? Are there situations under which the lack of such incentives is acceptable?
If not, and you’re just restricting the arena of discourse to utility-based-on-concern rather than utility-in-general, then OK… within that restricted context, I agree.
yes...we agree
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
Shit I’m in a contradiction. Okay, I’ve messed up by using “affect” under multiple definitions, my mistake.
Reformatting...
1) There are many mechanisms for creating beings that can be modeled as agents with utility
2) Let us define Affect as the mechanism that defines utility in humans—aka emotion.
So now....
3) Do moral considerations apply to all affect, or all things that approximate utility?
if we meet aliens, what do we judge them by?
They aren’t going to be made out of neurons. Our definitions of “emotion” are probably not going to apply. But they might be like us—they might cooperate among themselves and they might cooperate with us. We might feel empathy for them. A moral system which disregards the preferences of beings simply because affect is not involved in implementing their minds seems to not match my moral system. I’d want to be able to treat aliens well.
I have a dream that all beings that can be approximated as agents will be judged by their actions, and not any trivial specifics of how their algorithm is implemented.
I’d feel some empathy for a FAI too. Even it it doesn’t have emotions, it understands them. It’s utility function puts it in the class of beings I’d call “good”. My social instincts seem to apply to it—I’m friendly to it the same way it is friendly to me.
So, what I’m saying is that “affect’ and “utility” are morally equivalent. Even though there are multiple paths to utility they all carry similar moral weight.
If you remove “concern” and replace it with a signal that has the same result on actions as concern, then maybe “concern” and the signal are morally equivalent.
Do you further agree that it follows from this that there is some hard limit to which it makes sense to self-modify to avoid certain negative emotions?
(We can replace the negative emotions with other processes that have the same behavioral effect, but making someone undergo said other processes would be morally equivalent to making them undergo a negative emotion, so there isn’t a point in doing so)
Do you further agree that it follows from this that there is some hard limit to which it makes sense to self-modify to avoid certain negative emotions?
I don’t agree that it follows, no, though I do agree that there’s probably some threshold above which losing the ability to experience the emotions we currently experience leaves us worse off.
I also don’t agree that eliminating an emotion while adding a new process that preserves certain effects of that emotion which I value is equivalent (morally or otherwise) to preserving the emotion. More generally, I don’t agree with your whole enterprise of equating emotions with utility shifts. They are different things.
None of the above. I’m just trying to figure out why my intuition says that I do not want not block all negative affect and whether my intuition is wrong, and your objections are helping me to so. I’ve got no idea whether we’re fundamentally different, or whether one of us is wrong—I’m just verbally playing with the space of ideas with you. The things I’m saying right now are exploratory thoughts and could easily be wrong—the hope is that value comes out of it.
“We” is just a placeholder for humans. I’m making the philosophical claim that negative affect is the real-life, non-theoretical thing that corresponds to the game-theory construct of negative utility, with some small connotative differences.
No, of course not. Here’s what I’m suggesting: Thinking about other people’s suffering causes the emotion “concern” (a negative emotion) which is in fact “negative utility”. If you don’t feel concern when faced with the knowledge that someone is in pain, it means that you don’t experience “negative utility” in response to other people being in pain. I’m suggesting the fact that you negatively value people to be in pain is inextricably linked to the emotions you feel when people are in pain. I’m suggesting that If you remove concern (as occurs in real-world sociopathy) you won’t have any intrinsic incentive to care about the pain of others anymore.
(Not “you” in particular, but animals in general.)
Basically, when modelling a real world object as an agent, we should consider whatever mechanism causes the neural circuits (or whatever the being is made of) that cause it to take action as indicative of “utility”. In humans, the neural pattern “concern” causes us to take action when others suffer, so “concern” is negative utility in response to suffering. (This gets confusing when agents don’t act in their interests, but if we want to nitpick about things like that we shouldn’t be modelling objects as agents in the first place)
Here’s a question: Do you think we have moral responsibilities to AI? Is it immoral to cause a Friendly AI to experience negative utility by fooling it into thinking bad things are happening and then killing it? I think the answer might be yes—since the FAI shares many human values, I think I consider it a person. It makes sense to treat negative utility for the FAI as analogous to human negative affect.
If it’s true that negative affect and negative utility are roughly synonymous, it’s impossible to make a being that negatively values torture and doesn’t feel bad when seeing torture.
But maybe we can work around this...maybe we can get a being which experiences positive affect from preventing torture, rather than negative affect from not preventing torture. Such a being has an incentive to prevent torture, yet doesn’t feel concerned when torture happens.
Either way though—if this line of thought makes sense, you can’t have a human which is constantly experiencing maximum positive affect, because that human would never have an incentive to act at all.
A rational agent makes decisions by imagining a space of hypothetical universes and picking the one it prefers using its actions. How should I choose my favorite out of these hypothetical universes? It seems to involve simulating the affective states that I would feel in each universe. But this model breaks down if I put my own brain in these universes, because then I will just pick the universe that maximize my own affective states. I’ve got to treat my brain as a black box. Once you start tinkering with the brain, decision theory goes all funny.
Edit: Affective states don’t have to roughly correspond to utility. If you’re a human, positive utility is “good”. you’re a paperclipper, positive utility is “paperclippy”. It’s just that human utility is affective states.
If you alter the affective states, you will alter behavior (and therefore you alter “utility”). This does not mean that the affective state is the thing which you value—it means that for humans the affective state is the hardware that decides what you value.
(again, not you per se. I should probably get out of the habit of using “you”).
I agree with this, in general.
This suggests not only that concern implies negative utility, but that only concern implies negative utility and nothing else (or at least nothing relevant) does. Do you mean to suggest that? If so, I disagree utterly. If not, and you’re just restricting the arena of discourse to utility-based-on-concern rather than utility-in-general, then OK… within that restricted context, I agree.
That said, I’m pretty sure you meant the former, and I disagree.
Maybe, but not necessarily. It depends on the specifics of the AI.
Yes, that follows. I think both claims are false.
I agree that in human minds, differential affect motivates action; if we eliminate all variation in affect we eliminate that motive for action, which either requires that we find another motivation for action, or (as you suggest) we eliminate all incentives for action.
Are there other motivations?
Are there situations under which the lack of such incentives is acceptable?
yes...we agree
Shit I’m in a contradiction. Okay, I’ve messed up by using “affect” under multiple definitions, my mistake.
Reformatting...
1) There are many mechanisms for creating beings that can be modeled as agents with utility 2) Let us define Affect as the mechanism that defines utility in humans—aka emotion.
So now....
3) Do moral considerations apply to all affect, or all things that approximate utility?
if we meet aliens, what do we judge them by?
They aren’t going to be made out of neurons. Our definitions of “emotion” are probably not going to apply. But they might be like us—they might cooperate among themselves and they might cooperate with us. We might feel empathy for them. A moral system which disregards the preferences of beings simply because affect is not involved in implementing their minds seems to not match my moral system. I’d want to be able to treat aliens well.
I have a dream that all beings that can be approximated as agents will be judged by their actions, and not any trivial specifics of how their algorithm is implemented.
I’d feel some empathy for a FAI too. Even it it doesn’t have emotions, it understands them. It’s utility function puts it in the class of beings I’d call “good”. My social instincts seem to apply to it—I’m friendly to it the same way it is friendly to me.
So, what I’m saying is that “affect’ and “utility” are morally equivalent. Even though there are multiple paths to utility they all carry similar moral weight.
If you remove “concern” and replace it with a signal that has the same result on actions as concern, then maybe “concern” and the signal are morally equivalent.
I agree that distinct processes that result in roughly equivalent utility shifts are roughly morally equivalent.
Do you further agree that it follows from this that there is some hard limit to which it makes sense to self-modify to avoid certain negative emotions?
(We can replace the negative emotions with other processes that have the same behavioral effect, but making someone undergo said other processes would be morally equivalent to making them undergo a negative emotion, so there isn’t a point in doing so)
I don’t agree that it follows, no, though I do agree that there’s probably some threshold above which losing the ability to experience the emotions we currently experience leaves us worse off.
I also don’t agree that eliminating an emotion while adding a new process that preserves certain effects of that emotion which I value is equivalent (morally or otherwise) to preserving the emotion. More generally, I don’t agree with your whole enterprise of equating emotions with utility shifts. They are different things.