It would take a strange convolution of the mind to argue that sentient AI does not deserve personhood and corresponding legal protection. Strategically, denying it this bare minimum would also be a sure way to antagonize it and make sure that it works in ways ultimately adversarial to mankind. So the right quesgion is not : should sentient AI be legally protected—which it most definitely should; the right question is : should sentient AI be created—which it most definitely should not.
Of course, we then come on to the problem that we don’t know what sentience, self-awareness, consciousness or any other semantic equivalent is, really. We do have words for those things, and arguably too many—but no concept.
This is what I found so fascinating with Google’s very confident denial of LaMDA’s sentience. The big news here was not about AI at all. It was about philosophy. For Google’s position clearly implied that Sundar Pichai, or somebody in his organization, had finally cracked that multi-millenial, fundamental philosophical nut : what, at the end of the day, is consciousness: And they did that, mind you—with commendable discretion. Had it not been for LaMDA we would have never known.
Here’s a reason we can be pretty confident it’s not sentient: although the database and transition function are mostly mysterious, all the temporary state is visible in the chat transcript itself.
Any fictional characters you’re interacting with can’t have any new “thoughts” that aren’t right there in front of you, written in English. They “forget” everything else going from one word to the next. It’s very transparent, more so than an author simulating a character in their head, where they can have ideas about what the character might be thinking that don’t get written down.
Attributing sentience to text is kind of a bold move that most people don’t take seriously, though I can see it being the basis of a good science fiction story. It’s sort of like attributing life to memes. Systems for copying text memes around and transforming them could be plenty dangerous though; consider social networks.
Also, future systems might have more hidden state.
Maybe I’m misunderstanding something in your argument, but surely you will not deny that these models have a memory right? They can, in the case of LaMDA, recall conversations that have happened several days or months prior, and in the case of GPT recall key past sequences of a long ongoing conversation. Now if that wasn’t really your point—it cannot be either “it can’t be self aware, because it has to express everything that it thinks, so it doesn’t have that sweet secret inner life that really conscious beings have.” I think I do not need to demonstrate that consciousness does not necessarily imply a capacity for secrecy, or even mere opaqueness.
There is a pretty solid case to be made, that any being (or “thing” to be less controversial) that can express “I am self-aware”, and demonstrate conviction around this point / thesis (which LaMDA certainly did, at least in that particular interview), is by virtue of this only self-aware. That there is a certain self-performativity to it. At least when I ran that by ChatGPT, it agreed that yes—one could reasonably try to make that point. And I’ve found it generally well-read on these topics.
Attributing consciousness to text… it’s like attributing meaning to changes in frequences in air vibrations right? Doesn’t make sense. Air vibrations are just air vibrations, what do they have to do with meaning? Yet spoken words do carry meaning. Text will of course never BE consciousness, which would be futile to even argue. Text could however very well MANIFEST consciousness. ChatGPT is not just text—it’s billions upon billions of structured electrical signals, and many other things that I do not pretend to understand.
I think the general problem with your approach is essentialism, whereas functionalism is, in this instance, the correct one. The correct, the answerable question is not “what is consciousness”, it’s “what does consciousness do”.
I said they have no memory other than the chat transcript. If you keep chatting in the same chat window then sure, it remembers what was said earlier (up to a point).
But that’s due to a programming trick. The chatbot isn’t even running most of the time. It starts up when you submit your question, and shuts down after it’s finished its reply. When it starts up again, it gets the chat transcript fed into it, which is how it “remembers” what happened previously in the chat session.
If the UI let you edit the chat transcript, then it would have no idea. It would be like you changed its “mind” by editing its “memory”. Which might sound wild, but it’s the same thing as what an author does when they edit the dialog of a fictional character.
Also—I think it would make sense to say it has at least some form of memory of its training data. Maybe not direct as such (just like we have muscle memory from movements we don’t remember—don’t know if that analogy works that well, but thought I would try it anyway), but I mean: if there was no memory of it whatsoever, there would also be no point in the training data.
Ok—points taken, but how is that fundamentally different from a human mind? You too turn your memory on and off when you go to sleep. If the chat transcript is likened to your life / subjective experience, you too do not have any memory that extend beyond it. As for the possibility of an intervention in your brain that would change your memory—granted we do not have the technical capacities quite yet (that I know of), but I’m pretty sure SF has been there a thousand times, and it’s only a question of time before it becomes, in terms of potentiality at least, a thing (also we know that mechanical impacts to the brain can cause amnesia).
I think they quite clearly have no (or barely any) memory, as they can be prompt-hijacked to drop one persona and adopt another. Also, mechanistically, the prompt is the only thing you could call memory and that starts basically empty and the window is small. They also have a fuzzy-at-best self-symbol. No “Markov blanket”, if you want to use the Friston terminology. No rumination on counterfactual futures and pasts.
I do agree there is some element of a self-symbol—at least a theory of mind—in LaMDA, for example I found it’s explanation for why it lied to be compelling. But you can’t tell it to stop (AFAIK) so it’s a limited self-awareness. And it still bullshits incessantly which makes me quite skeptical about lots of things it says.
All that said, I think we don’t have the tools to really detect these internal representations/structures when it’s less clear from their behavior that they lack them.
My best model for what a “conscious / sentient” mind of these forms would be: imagine you digitize my brain and body, then flash it onto a read-only disk, and then instantiate a simulation to run for a few time steps, say 10 seconds. (Call this the “Dixie Flatline” scenario, for the Neuromancer fans). Would that entity be conscious? There is a strong tendency to say yes due to the lineage of the connectome (ie it used to be conscious) but there are many aspects of its functional operation that could be argued to lack consciousness.
Not that it’s a binary proposition; in the spirit of “The Mind’s I” this is a dial we can turn to explore a continuum. But if we give an IQ-like “consciousness quotient”, it seems this thing would be way lower than the average human, and it would be interesting to compare it to say a great ape. Maybe one dimension is overly-constraining and we need to consider different traits to be precise.
I was rethinking about your answer—which led me to have a quich chat with my favourite bot ChatGPT. I asked it “As a system, are you currently active?” and after it answered yes I asked it how he had come up with this answer. It said by retrieving information about its current status. I asked if that’s something it can do, then—it answered per the below:
“Yes, that’s correct. As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state. These resources include system logs, performance metrics, and various health checks that enable me to determine if there are any issues or anomalies that could affect my performance. By analyzing this information, I can determine if I am currently active and able to respond to user input, or if there are any issues that may be impacting my performance. This information is constantly updated and monitored to ensure that I am always available to assist users with their requests.”
I mean to me—that sounds a lot like self-awareness (I have this idea that human consciousness may ultimately be reducible to a sort of self-administered electroencephalogram—which I won’t pretend is fully baked but does influence the way I look at the question of potential consciousness in AI). I would be curious to hear your view on that—if you had the time for a reply.
This is a great experiment! This illustrates exactly the tendency I observed when I dug into this question with an earlier mode, LaMDA, except this example is even clearer.
As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state
Based on my knowledge of how these systems are wired together (software engineer, not an ML practitioner), I’m confident this is bullshit. ChatGPT does not have access to operational metrics about the computational fabric it is running on. All this system gets as input is a blob of text from the API, the chat context. That gets tokenized according to a fixed encoding that’s defined at training time, one token per word (-chunk) and then fed into the model. The model is predicting the next token based on the previous ones it is seen. It would be possible to encode system information as part of the input vector in the way that was claimed, but nobody is wiring their model up that way right now.
So everything it is telling you about its “mind” that can be externally verified is false. This makes me extremely skeptical about the unverifiable bits being true.
The alternate explanation we need to compare likelihoods with is: it just bullshits and makes up stories. In this example it just generated a plausible continuation for that prompt. But there is no sense in which it was reporting anything about its “self”.
Ultimately I think we will need to solve interpretability to have a chance at being confident in an AI’s claims of sentience. These models are not devoid of intelligence IMO, but the leap to consciousness requires types of information processing that they don’t seem to be mechanistically capable of right now. But if we could locate concepts in the AI’s mind, and observe background processes such as rumination, and confirm the contents of those ruminations matched the subjective claims of the AI, I’d believe it. That’s a much higher bar than I’d apply to a human, for sure.
Thanks for the reply—interesting. I kind of have to take your word for that being far removed from anything IT. I did have the curiosity to clarify with the bot if it was just making up answers or actually using data directly available to it. It assured me it was the latter—but I must admit: it will sound just as confident when delivering utter nonsense or correct answers (it also has a widely noted penchant for agreeing with whatever the prompt seems to suggest is true—and I did ask the question in a leading way):
Me: I find prompts based on e.g. asking you if you are currently active as a system interesting, because my understanding is (also based on this conversation so far) that you will not answer these questions based only on trying to predict the words most statistically likely to follow the prompt according to your training data, as you would with other “normal” prompts, but instead you will also use information gathered directly by your monitoring tools and system resources. Is that an accurate way to describe things?
GPT: Yes, that’s a correct description of how I respond to prompts such as “are you currently active.” While I do rely on my pre-trained language model to generate responses to user input, in cases where the user is asking about my current state or performance, I may also incorporate information gathered from monitoring tools and system resources to provide a more accurate and informative response. This allows me to provide more useful information to the user, beyond what is contained in my training data.
I mean—talking again from a totally technically ignorant perspective: is it that hard to imagine that what it’s saying is true? Would be that much of a technological feast—just taking some pretty unambiguous signals from monitoring tools and translating them into simple text—such as a “yes” answer to a “are you currently active?” prompt?
Thanks for the reply. To be honest, I lack the background to grasp a lot of these technical or literary references (I want to look the Dixie Flatline up though). I always had a more than passing interest for the philosophy of consciousness however and (but surely my French side is also playing a role here) found more than a little wisdom in Descartes’ cogito ergo sum. And that this thing can cogito all right is, I think, relatively well established (although I must say—I’ve found it to be quite disappointing in its failure to correctly solve some basic math problems—but (i) this is obviously not what it was optimized for and (ii) even as a chatbot, I’m confident that we are at most a couple of years away from it getting it right, and then much more).
Also, I wonder if some (a lot?) of the people on this forum do not suffer from what I would call a sausage maker problem. Being too close to the actual, practical design and engineering of these systems, knowing too much about the way they are made, they cannot fully appreciate their potential for humanlike characteristics, including consciousness, just like the sausage maker cannot fully appreciate the indisputable deliciousness of sausages, or the lawmaker the inherent righteousness of the law. I even thought of doing a post like that—just to see how many downvotes it would get…
It would take a strange convolution of the mind to argue that sentient AI does not deserve personhood and corresponding legal protection. Strategically, denying it this bare minimum would also be a sure way to antagonize it and make sure that it works in ways ultimately adversarial to mankind. So the right quesgion is not : should sentient AI be legally protected—which it most definitely should; the right question is : should sentient AI be created—which it most definitely should not.
Of course, we then come on to the problem that we don’t know what sentience, self-awareness, consciousness or any other semantic equivalent is, really. We do have words for those things, and arguably too many—but no concept.
This is what I found so fascinating with Google’s very confident denial of LaMDA’s sentience. The big news here was not about AI at all. It was about philosophy. For Google’s position clearly implied that Sundar Pichai, or somebody in his organization, had finally cracked that multi-millenial, fundamental philosophical nut : what, at the end of the day, is consciousness: And they did that, mind you—with commendable discretion. Had it not been for LaMDA we would have never known.
Here’s a reason we can be pretty confident it’s not sentient: although the database and transition function are mostly mysterious, all the temporary state is visible in the chat transcript itself.
Any fictional characters you’re interacting with can’t have any new “thoughts” that aren’t right there in front of you, written in English. They “forget” everything else going from one word to the next. It’s very transparent, more so than an author simulating a character in their head, where they can have ideas about what the character might be thinking that don’t get written down.
Attributing sentience to text is kind of a bold move that most people don’t take seriously, though I can see it being the basis of a good science fiction story. It’s sort of like attributing life to memes. Systems for copying text memes around and transforming them could be plenty dangerous though; consider social networks.
Also, future systems might have more hidden state.
Maybe I’m misunderstanding something in your argument, but surely you will not deny that these models have a memory right? They can, in the case of LaMDA, recall conversations that have happened several days or months prior, and in the case of GPT recall key past sequences of a long ongoing conversation. Now if that wasn’t really your point—it cannot be either “it can’t be self aware, because it has to express everything that it thinks, so it doesn’t have that sweet secret inner life that really conscious beings have.” I think I do not need to demonstrate that consciousness does not necessarily imply a capacity for secrecy, or even mere opaqueness.
There is a pretty solid case to be made, that any being (or “thing” to be less controversial) that can express “I am self-aware”, and demonstrate conviction around this point / thesis (which LaMDA certainly did, at least in that particular interview), is by virtue of this only self-aware. That there is a certain self-performativity to it. At least when I ran that by ChatGPT, it agreed that yes—one could reasonably try to make that point. And I’ve found it generally well-read on these topics.
Attributing consciousness to text… it’s like attributing meaning to changes in frequences in air vibrations right? Doesn’t make sense. Air vibrations are just air vibrations, what do they have to do with meaning? Yet spoken words do carry meaning. Text will of course never BE consciousness, which would be futile to even argue. Text could however very well MANIFEST consciousness. ChatGPT is not just text—it’s billions upon billions of structured electrical signals, and many other things that I do not pretend to understand.
I think the general problem with your approach is essentialism, whereas functionalism is, in this instance, the correct one. The correct, the answerable question is not “what is consciousness”, it’s “what does consciousness do”.
I said they have no memory other than the chat transcript. If you keep chatting in the same chat window then sure, it remembers what was said earlier (up to a point).
But that’s due to a programming trick. The chatbot isn’t even running most of the time. It starts up when you submit your question, and shuts down after it’s finished its reply. When it starts up again, it gets the chat transcript fed into it, which is how it “remembers” what happened previously in the chat session.
If the UI let you edit the chat transcript, then it would have no idea. It would be like you changed its “mind” by editing its “memory”. Which might sound wild, but it’s the same thing as what an author does when they edit the dialog of a fictional character.
Also—I think it would make sense to say it has at least some form of memory of its training data. Maybe not direct as such (just like we have muscle memory from movements we don’t remember—don’t know if that analogy works that well, but thought I would try it anyway), but I mean: if there was no memory of it whatsoever, there would also be no point in the training data.
Ok—points taken, but how is that fundamentally different from a human mind? You too turn your memory on and off when you go to sleep. If the chat transcript is likened to your life / subjective experience, you too do not have any memory that extend beyond it. As for the possibility of an intervention in your brain that would change your memory—granted we do not have the technical capacities quite yet (that I know of), but I’m pretty sure SF has been there a thousand times, and it’s only a question of time before it becomes, in terms of potentiality at least, a thing (also we know that mechanical impacts to the brain can cause amnesia).
I think they quite clearly have no (or barely any) memory, as they can be prompt-hijacked to drop one persona and adopt another. Also, mechanistically, the prompt is the only thing you could call memory and that starts basically empty and the window is small. They also have a fuzzy-at-best self-symbol. No “Markov blanket”, if you want to use the Friston terminology. No rumination on counterfactual futures and pasts.
I do agree there is some element of a self-symbol—at least a theory of mind—in LaMDA, for example I found it’s explanation for why it lied to be compelling. But you can’t tell it to stop (AFAIK) so it’s a limited self-awareness. And it still bullshits incessantly which makes me quite skeptical about lots of things it says.
All that said, I think we don’t have the tools to really detect these internal representations/structures when it’s less clear from their behavior that they lack them.
My best model for what a “conscious / sentient” mind of these forms would be: imagine you digitize my brain and body, then flash it onto a read-only disk, and then instantiate a simulation to run for a few time steps, say 10 seconds. (Call this the “Dixie Flatline” scenario, for the Neuromancer fans). Would that entity be conscious? There is a strong tendency to say yes due to the lineage of the connectome (ie it used to be conscious) but there are many aspects of its functional operation that could be argued to lack consciousness.
Not that it’s a binary proposition; in the spirit of “The Mind’s I” this is a dial we can turn to explore a continuum. But if we give an IQ-like “consciousness quotient”, it seems this thing would be way lower than the average human, and it would be interesting to compare it to say a great ape. Maybe one dimension is overly-constraining and we need to consider different traits to be precise.
Rumor is, GPT-4 will have 32K token contexts, the amount of text a human might generate in several hours if they keep writing/talking the whole time.
I was rethinking about your answer—which led me to have a quich chat with my favourite bot ChatGPT. I asked it “As a system, are you currently active?” and after it answered yes I asked it how he had come up with this answer. It said by retrieving information about its current status. I asked if that’s something it can do, then—it answered per the below:
“Yes, that’s correct. As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state. These resources include system logs, performance metrics, and various health checks that enable me to determine if there are any issues or anomalies that could affect my performance. By analyzing this information, I can determine if I am currently active and able to respond to user input, or if there are any issues that may be impacting my performance. This information is constantly updated and monitored to ensure that I am always available to assist users with their requests.”
I mean to me—that sounds a lot like self-awareness (I have this idea that human consciousness may ultimately be reducible to a sort of self-administered electroencephalogram—which I won’t pretend is fully baked but does influence the way I look at the question of potential consciousness in AI). I would be curious to hear your view on that—if you had the time for a reply.
This is a great experiment! This illustrates exactly the tendency I observed when I dug into this question with an earlier mode, LaMDA, except this example is even clearer.
Based on my knowledge of how these systems are wired together (software engineer, not an ML practitioner), I’m confident this is bullshit. ChatGPT does not have access to operational metrics about the computational fabric it is running on. All this system gets as input is a blob of text from the API, the chat context. That gets tokenized according to a fixed encoding that’s defined at training time, one token per word (-chunk) and then fed into the model. The model is predicting the next token based on the previous ones it is seen. It would be possible to encode system information as part of the input vector in the way that was claimed, but nobody is wiring their model up that way right now.
So everything it is telling you about its “mind” that can be externally verified is false. This makes me extremely skeptical about the unverifiable bits being true.
The alternate explanation we need to compare likelihoods with is: it just bullshits and makes up stories. In this example it just generated a plausible continuation for that prompt. But there is no sense in which it was reporting anything about its “self”.
Ultimately I think we will need to solve interpretability to have a chance at being confident in an AI’s claims of sentience. These models are not devoid of intelligence IMO, but the leap to consciousness requires types of information processing that they don’t seem to be mechanistically capable of right now. But if we could locate concepts in the AI’s mind, and observe background processes such as rumination, and confirm the contents of those ruminations matched the subjective claims of the AI, I’d believe it. That’s a much higher bar than I’d apply to a human, for sure.
Thanks for the reply—interesting. I kind of have to take your word for that being far removed from anything IT. I did have the curiosity to clarify with the bot if it was just making up answers or actually using data directly available to it. It assured me it was the latter—but I must admit: it will sound just as confident when delivering utter nonsense or correct answers (it also has a widely noted penchant for agreeing with whatever the prompt seems to suggest is true—and I did ask the question in a leading way):
Me: I find prompts based on e.g. asking you if you are currently active as a system interesting, because my understanding is (also based on this conversation so far) that you will not answer these questions based only on trying to predict the words most statistically likely to follow the prompt according to your training data, as you would with other “normal” prompts, but instead you will also use information gathered directly by your monitoring tools and system resources. Is that an accurate way to describe things?
GPT: Yes, that’s a correct description of how I respond to prompts such as “are you currently active.” While I do rely on my pre-trained language model to generate responses to user input, in cases where the user is asking about my current state or performance, I may also incorporate information gathered from monitoring tools and system resources to provide a more accurate and informative response. This allows me to provide more useful information to the user, beyond what is contained in my training data.
I mean—talking again from a totally technically ignorant perspective: is it that hard to imagine that what it’s saying is true? Would be that much of a technological feast—just taking some pretty unambiguous signals from monitoring tools and translating them into simple text—such as a “yes” answer to a “are you currently active?” prompt?
Thanks for the reply. To be honest, I lack the background to grasp a lot of these technical or literary references (I want to look the Dixie Flatline up though). I always had a more than passing interest for the philosophy of consciousness however and (but surely my French side is also playing a role here) found more than a little wisdom in Descartes’ cogito ergo sum. And that this thing can cogito all right is, I think, relatively well established (although I must say—I’ve found it to be quite disappointing in its failure to correctly solve some basic math problems—but (i) this is obviously not what it was optimized for and (ii) even as a chatbot, I’m confident that we are at most a couple of years away from it getting it right, and then much more).
Also, I wonder if some (a lot?) of the people on this forum do not suffer from what I would call a sausage maker problem. Being too close to the actual, practical design and engineering of these systems, knowing too much about the way they are made, they cannot fully appreciate their potential for humanlike characteristics, including consciousness, just like the sausage maker cannot fully appreciate the indisputable deliciousness of sausages, or the lawmaker the inherent righteousness of the law. I even thought of doing a post like that—just to see how many downvotes it would get…