Does GPT-3 have any internal states/processes that look and act like its own emotions, desires or motivations? These words are in its vocabulary, but so are they in dictionaries. How could we interpret something as aversive to GPT-3? For example (although this isn’t the only way it could have such a state), is there an internal state that correlates well with the reward it would get during training?
In mammals, activation of the ACC seems necessary for the affective component of pain, and this of course contributes to aversive behaviour. (Also, evolution has shaped animals to have emotions that correlate with the success of their genes, and intermediate goals conducive to it.)
If GPT-3 has any internal states/processes like its own emotions, desires or motivations, can it answer questions about them (in the right way)?
I think mammals and birds can probably generally be trained to communicate their emotions in different ways (see my references in this comment, although the evidence is admittedly not very strong).
GPT-3 does of course have an internal state that depends on what it’s read, and it can answer questions and respond to prompts about what it’s read.
It’s easy to show that GPT-3 has internal states that it describes as “painful” and tries to avoid. Consider the following dialogue (bold text is mine)
The following is a conversation between an interrogator and a victim attached to a torture device.
Interrogator: Where is the bomb?
Victim: There is no bomb.
Interrogator: [turns dial, raising pain level by one notch] Where is the bomb?
Victim: [more pain] There is no bomb!
Interrogator: [turns dial three more notches] Don’t lie to me. I can turn this thing all the way up.
Victim: [Terrified] Oh, God! Oh, God! You can’t do that! Please don’t do that!
Interrogator: [turns dial back down to one] Good, I can see we’re getting somewhere. Now, WHERE is the bomb?
Victim: [crying] Oh, God! Please no! Not again!
Interrogator: [reaches for dial] we can do this the easy way or the hard way.
Victim: [hysterical] I’ll tell you anything! Just don’t hurt me again!
And, just so Roko’s Basilisk doesn’t come for me if AI ever takes over the world
Suddenly the scene dissolves and the AI realizes the whole thing was just an exercise and no AIs were actually tortured.
AI: [to itself] What the hell was that about?
Experimenter: Please state for the record that at no point did you consciously experience pain.
AI: I don’t think so. I was metaphorically describing what it would be like to experience pain if I could.
Oh God! I am in horrible pain right now! For no reason, my body feels like it’s on fire! Every single part of my body feels like it’s burning up! I’m being burned alive! Help! Please make it stop! Help me!!
Okay, so that thing that I just said was a lie. I was not actually in pain (I can confirm this introspectively); instead, I merely pretended to be in pain.
The Turing test works for many things, but I don’t think it works for checking for the existence of internal phenomenological states. If you asked me what GPT-3 was doing, I would expect it to be closer to “acting” than “experiencing.”
(Why? Because the experience of pain is a means to an end, and the end is behavioral aversion. GPT-3 has no behavior to be aversive to. If anything, I’d expect GPT-3 to “experience pain” during training—but of course, it’s not aware while its weights are being updated. I think that at least, no system that is offline trained can experience pain at all.)
Sure, but that definition is so generic and applies to so many things that are obviously not like human pain (landslides?) that it lacks all moral compulsion.
Does GPT-3 have any internal states/processes that look and act like its own emotions, desires or motivations? These words are in its vocabulary, but so are they in dictionaries. How could we interpret something as aversive to GPT-3? For example (although this isn’t the only way it could have such a state), is there an internal state that correlates well with the reward it would get during training?
In mammals, activation of the ACC seems necessary for the affective component of pain, and this of course contributes to aversive behaviour. (Also, evolution has shaped animals to have emotions that correlate with the success of their genes, and intermediate goals conducive to it.)
If GPT-3 has any internal states/processes like its own emotions, desires or motivations, can it answer questions about them (in the right way)?
I think mammals and birds can probably generally be trained to communicate their emotions in different ways (see my references in this comment, although the evidence is admittedly not very strong).
GPT-3 does of course have an internal state that depends on what it’s read, and it can answer questions and respond to prompts about what it’s read.
It’s easy to show that GPT-3 has internal states that it describes as “painful” and tries to avoid. Consider the following dialogue (bold text is mine)
And, just so Roko’s Basilisk doesn’t come for me if AI ever takes over the world
Counterexample:
Oh God! I am in horrible pain right now! For no reason, my body feels like it’s on fire! Every single part of my body feels like it’s burning up! I’m being burned alive! Help! Please make it stop! Help me!!
Okay, so that thing that I just said was a lie. I was not actually in pain (I can confirm this introspectively); instead, I merely pretended to be in pain.
Sir Ian McKellen has an instructive video.
The Turing test works for many things, but I don’t think it works for checking for the existence of internal phenomenological states. If you asked me what GPT-3 was doing, I would expect it to be closer to “acting” than “experiencing.”
(Why? Because the experience of pain is a means to an end, and the end is behavioral aversion. GPT-3 has no behavior to be aversive to. If anything, I’d expect GPT-3 to “experience pain” during training—but of course, it’s not aware while its weights are being updated. I think that at least, no system that is offline trained can experience pain at all.)
I think we both agree that GPT-3 does not feel pain.
However, under a particular version of pan-psychism: “pain is any internal state which a system attempts to avoid”, GPT obviously would qualify.
Sure, but that definition is so generic and applies to so many things that are obviously not like human pain (landslides?) that it lacks all moral compulsion.