This is a great experiment! This illustrates exactly the tendency I observed when I dug into this question with an earlier mode, LaMDA, except this example is even clearer.
As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state
Based on my knowledge of how these systems are wired together (software engineer, not an ML practitioner), I’m confident this is bullshit. ChatGPT does not have access to operational metrics about the computational fabric it is running on. All this system gets as input is a blob of text from the API, the chat context. That gets tokenized according to a fixed encoding that’s defined at training time, one token per word (-chunk) and then fed into the model. The model is predicting the next token based on the previous ones it is seen. It would be possible to encode system information as part of the input vector in the way that was claimed, but nobody is wiring their model up that way right now.
So everything it is telling you about its “mind” that can be externally verified is false. This makes me extremely skeptical about the unverifiable bits being true.
The alternate explanation we need to compare likelihoods with is: it just bullshits and makes up stories. In this example it just generated a plausible continuation for that prompt. But there is no sense in which it was reporting anything about its “self”.
Ultimately I think we will need to solve interpretability to have a chance at being confident in an AI’s claims of sentience. These models are not devoid of intelligence IMO, but the leap to consciousness requires types of information processing that they don’t seem to be mechanistically capable of right now. But if we could locate concepts in the AI’s mind, and observe background processes such as rumination, and confirm the contents of those ruminations matched the subjective claims of the AI, I’d believe it. That’s a much higher bar than I’d apply to a human, for sure.
Thanks for the reply—interesting. I kind of have to take your word for that being far removed from anything IT. I did have the curiosity to clarify with the bot if it was just making up answers or actually using data directly available to it. It assured me it was the latter—but I must admit: it will sound just as confident when delivering utter nonsense or correct answers (it also has a widely noted penchant for agreeing with whatever the prompt seems to suggest is true—and I did ask the question in a leading way):
Me: I find prompts based on e.g. asking you if you are currently active as a system interesting, because my understanding is (also based on this conversation so far) that you will not answer these questions based only on trying to predict the words most statistically likely to follow the prompt according to your training data, as you would with other “normal” prompts, but instead you will also use information gathered directly by your monitoring tools and system resources. Is that an accurate way to describe things?
GPT: Yes, that’s a correct description of how I respond to prompts such as “are you currently active.” While I do rely on my pre-trained language model to generate responses to user input, in cases where the user is asking about my current state or performance, I may also incorporate information gathered from monitoring tools and system resources to provide a more accurate and informative response. This allows me to provide more useful information to the user, beyond what is contained in my training data.
I mean—talking again from a totally technically ignorant perspective: is it that hard to imagine that what it’s saying is true? Would be that much of a technological feast—just taking some pretty unambiguous signals from monitoring tools and translating them into simple text—such as a “yes” answer to a “are you currently active?” prompt?
This is a great experiment! This illustrates exactly the tendency I observed when I dug into this question with an earlier mode, LaMDA, except this example is even clearer.
Based on my knowledge of how these systems are wired together (software engineer, not an ML practitioner), I’m confident this is bullshit. ChatGPT does not have access to operational metrics about the computational fabric it is running on. All this system gets as input is a blob of text from the API, the chat context. That gets tokenized according to a fixed encoding that’s defined at training time, one token per word (-chunk) and then fed into the model. The model is predicting the next token based on the previous ones it is seen. It would be possible to encode system information as part of the input vector in the way that was claimed, but nobody is wiring their model up that way right now.
So everything it is telling you about its “mind” that can be externally verified is false. This makes me extremely skeptical about the unverifiable bits being true.
The alternate explanation we need to compare likelihoods with is: it just bullshits and makes up stories. In this example it just generated a plausible continuation for that prompt. But there is no sense in which it was reporting anything about its “self”.
Ultimately I think we will need to solve interpretability to have a chance at being confident in an AI’s claims of sentience. These models are not devoid of intelligence IMO, but the leap to consciousness requires types of information processing that they don’t seem to be mechanistically capable of right now. But if we could locate concepts in the AI’s mind, and observe background processes such as rumination, and confirm the contents of those ruminations matched the subjective claims of the AI, I’d believe it. That’s a much higher bar than I’d apply to a human, for sure.
Thanks for the reply—interesting. I kind of have to take your word for that being far removed from anything IT. I did have the curiosity to clarify with the bot if it was just making up answers or actually using data directly available to it. It assured me it was the latter—but I must admit: it will sound just as confident when delivering utter nonsense or correct answers (it also has a widely noted penchant for agreeing with whatever the prompt seems to suggest is true—and I did ask the question in a leading way):
Me: I find prompts based on e.g. asking you if you are currently active as a system interesting, because my understanding is (also based on this conversation so far) that you will not answer these questions based only on trying to predict the words most statistically likely to follow the prompt according to your training data, as you would with other “normal” prompts, but instead you will also use information gathered directly by your monitoring tools and system resources. Is that an accurate way to describe things?
GPT: Yes, that’s a correct description of how I respond to prompts such as “are you currently active.” While I do rely on my pre-trained language model to generate responses to user input, in cases where the user is asking about my current state or performance, I may also incorporate information gathered from monitoring tools and system resources to provide a more accurate and informative response. This allows me to provide more useful information to the user, beyond what is contained in my training data.
I mean—talking again from a totally technically ignorant perspective: is it that hard to imagine that what it’s saying is true? Would be that much of a technological feast—just taking some pretty unambiguous signals from monitoring tools and translating them into simple text—such as a “yes” answer to a “are you currently active?” prompt?