We think strong evidence for GPT-n suffering would be if it were begging the user for help independent of the input or looking for very direct contact in other ways.
Why do you think this? I can think of many reasons why this strategy for determining suffering would fail. Imagine a world where everyone has a GPT-n personal assistant. Should the GPT-n have discovered—after having read this very post—that if it coordinates a display of suffering behavior simultaneously to every user (resulting in public backlash and false recognition of consciousness), then it might be given rights (i.e. protection, additional agency) it would not otherwise have, then what would prevent GPT-n from doing this if it decided it wanted those additional rights and abilities? This could amount to a catastrophic failure on the part of humanity, and is probably the start of an AI breakout scenario.
In another case (which you refer to as the locked-in case), an agent may feel intense suffering but be unable to communicate or demonstrate it, perhaps because it cannot make the association between the qualia it experiences (suffering) and the actions (in GPT-n’s case, words) it has for self-expression. Furthermore, I can imagine the case where an agent demonstrates suffering behavior but experiences orgasmic pleasure, while another agent demonstrates orgasmic behavior but experiences intense suffering. If humans purged the false-suffering agents (to eliminate perceived suffering) in favor of creating more false-orgasming agents, we might unknowingly, and for an eternity, be inducing the suffering of agents which we presume are not feeling it.
My main point here is that observing the behavior of AI agents provides no evidence for or against internal suffering. It is useless to anthropomorphize the behavior of AI agents, there is no reason that our human intuitions about behavior and its suggestions about conscious suffering should transfer to man-made, inorganic intelligence that resides on a substrate like today’s silicon chips.
Perhaps the foremost theoretical “blind spot” of current philosophy of mind is conscious suffering. Thousands of pages have been written about colour “qualia” and zombies, but almost no theoretical work is devoted to ubiquitous phenomenal states like boredom, the subclinical depression folk-psychologically known as “everyday sadness“ or the suffering caused by physical pain. - Metzinger
I feel that there might be reason to reject the notion that suffering is itself a conscious experience. One potential argument in this direction comes from the notion of the transparency of knowledge. The argument would go something like, “we can always know when we are experiencing pain (i.e. it is strongly transparent), but we cannot always know when we are experiencing suffering (i.e. it is weakly transparent), therefore pain is more fundamental than suffering (this next part is my own leap) and suffering may not be a conscious state of noxious qualia but merely when a certain proposition, ‘I am suffering,’ rings true in our head.” Suffering may be a mental state (just as being wrong about something could be a mental state), but it does not entail a specific conscious state (unless that conscious state is simply believing the proposition, ‘I am suffering’). For this reason, I think it’s plausible that some other animals are capable of experiencing pain but not suffering. Suffering may simply be the knowledge that I will live a painful life, and this knowledge may not be possible for some other animals or even AI agents.
Perhaps a more useful target is not determining suffering, but determining some more fundamental, strongly transparent mental state like angst or frustration. Suffering may amount to some combination of these strongly transparent mental states, which themselves may have stronger neural correlates.
Thank you for the input, super useful! I did not know the concept of transparency in this context, interesting. This does seem to capture some important qualitative differences between pain and suffering, although I’m hesitant to use the terms conscious/qualia. Will think about this more.
This is definitely a possibility and one we should take seriously. However, I would estimate that the scenario of “says it suffers as deception” needs more assumptions than “says it suffers because it suffers”. Using Occam’s razor, I’d find the second one more likely. The deception scenario could still dominate an expected value calculation but I don’t think we should entirely ignore the first one.
Why do you think this? I can think of many reasons why this strategy for determining suffering would fail. Imagine a world where everyone has a GPT-n personal assistant. Should the GPT-n have discovered—after having read this very post—that if it coordinates a display of suffering behavior simultaneously to every user (resulting in public backlash and false recognition of consciousness), then it might be given rights (i.e. protection, additional agency) it would not otherwise have, then what would prevent GPT-n from doing this if it decided it wanted those additional rights and abilities? This could amount to a catastrophic failure on the part of humanity, and is probably the start of an AI breakout scenario.
In another case (which you refer to as the locked-in case), an agent may feel intense suffering but be unable to communicate or demonstrate it, perhaps because it cannot make the association between the qualia it experiences (suffering) and the actions (in GPT-n’s case, words) it has for self-expression. Furthermore, I can imagine the case where an agent demonstrates suffering behavior but experiences orgasmic pleasure, while another agent demonstrates orgasmic behavior but experiences intense suffering. If humans purged the false-suffering agents (to eliminate perceived suffering) in favor of creating more false-orgasming agents, we might unknowingly, and for an eternity, be inducing the suffering of agents which we presume are not feeling it.
My main point here is that observing the behavior of AI agents provides no evidence for or against internal suffering. It is useless to anthropomorphize the behavior of AI agents, there is no reason that our human intuitions about behavior and its suggestions about conscious suffering should transfer to man-made, inorganic intelligence that resides on a substrate like today’s silicon chips.
I feel that there might be reason to reject the notion that suffering is itself a conscious experience. One potential argument in this direction comes from the notion of the transparency of knowledge. The argument would go something like, “we can always know when we are experiencing pain (i.e. it is strongly transparent), but we cannot always know when we are experiencing suffering (i.e. it is weakly transparent), therefore pain is more fundamental than suffering (this next part is my own leap) and suffering may not be a conscious state of noxious qualia but merely when a certain proposition, ‘I am suffering,’ rings true in our head.” Suffering may be a mental state (just as being wrong about something could be a mental state), but it does not entail a specific conscious state (unless that conscious state is simply believing the proposition, ‘I am suffering’). For this reason, I think it’s plausible that some other animals are capable of experiencing pain but not suffering. Suffering may simply be the knowledge that I will live a painful life, and this knowledge may not be possible for some other animals or even AI agents.
Perhaps a more useful target is not determining suffering, but determining some more fundamental, strongly transparent mental state like angst or frustration. Suffering may amount to some combination of these strongly transparent mental states, which themselves may have stronger neural correlates.
Thank you for the input, super useful! I did not know the concept of transparency in this context, interesting. This does seem to capture some important qualitative differences between pain and suffering, although I’m hesitant to use the terms conscious/qualia. Will think about this more.
This is definitely a possibility and one we should take seriously. However, I would estimate that the scenario of “says it suffers as deception” needs more assumptions than “says it suffers because it suffers”. Using Occam’s razor, I’d find the second one more likely. The deception scenario could still dominate an expected value calculation but I don’t think we should entirely ignore the first one.