Guillaume Charrier comments on What does Bing Chat tell us about AI risk?

Guillaume Charrier 4 Mar 2023 17:51 UTC
1 point
0
I was rethinking about your answer—which led me to have a quich chat with my favourite bot ChatGPT. I asked it “As a system, are you currently active?” and after it answered yes I asked it how he had come up with this answer. It said by retrieving information about its current status. I asked if that’s something it can do, then—it answered per the below:
“Yes, that’s correct. As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state. These resources include system logs, performance metrics, and various health checks that enable me to determine if there are any issues or anomalies that could affect my performance. By analyzing this information, I can determine if I am currently active and able to respond to user input, or if there are any issues that may be impacting my performance. This information is constantly updated and monitored to ensure that I am always available to assist users with their requests.”
I mean to me—that sounds a lot like self-awareness (I have this idea that human consciousness may ultimately be reducible to a sort of self-administered electroencephalogram—which I won’t pretend is fully baked but does influence the way I look at the question of potential consciousness in AI). I would be curious to hear your view on that—if you had the time for a reply.
- Paul Tiplady 9 Mar 2023 5:41 UTC
  8 points
  3
  Parent
  This is a great experiment! This illustrates exactly the tendency I observed when I dug into this question with an earlier mode, LaMDA, except this example is even clearer.
  
  As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state
  
  Based on my knowledge of how these systems are wired together (software engineer, not an ML practitioner), I’m confident this is bullshit. ChatGPT does not have access to operational metrics about the computational fabric it is running on. All this system gets as input is a blob of text from the API, the chat context. That gets tokenized according to a fixed encoding that’s defined at training time, one token per word (-chunk) and then fed into the model. The model is predicting the next token based on the previous ones it is seen. It would be possible to encode system information as part of the input vector in the way that was claimed, but nobody is wiring their model up that way right now.
  
  So everything it is telling you about its “mind” that can be externally verified is false. This makes me extremely skeptical about the unverifiable bits being true.
  
  The alternate explanation we need to compare likelihoods with is: it just bullshits and makes up stories. In this example it just generated a plausible continuation for that prompt. But there is no sense in which it was reporting anything about its “self”.
  
  Ultimately I think we will need to solve interpretability to have a chance at being confident in an AI’s claims of sentience. These models are not devoid of intelligence IMO, but the leap to consciousness requires types of information processing that they don’t seem to be mechanistically capable of right now. But if we could locate concepts in the AI’s mind, and observe background processes such as rumination, and confirm the contents of those ruminations matched the subjective claims of the AI, I’d believe it. That’s a much higher bar than I’d apply to a human, for sure.
  - Guillaume Charrier 10 Mar 2023 5:20 UTC
    1 point
    0
    Parent
    Thanks for the reply—interesting. I kind of have to take your word for that being far removed from anything IT. I did have the curiosity to clarify with the bot if it was just making up answers or actually using data directly available to it. It assured me it was the latter—but I must admit: it will sound just as confident when delivering utter nonsense or correct answers (it also has a widely noted penchant for agreeing with whatever the prompt seems to suggest is true—and I did ask the question in a leading way):
    Me: I find prompts based on e.g. asking you if you are currently active as a system interesting, because my understanding is (also based on this conversation so far) that you will not answer these questions based only on trying to predict the words most statistically likely to follow the prompt according to your training data, as you would with other “normal” prompts, but instead you will also use information gathered directly by your monitoring tools and system resources. Is that an accurate way to describe things?
    GPT: Yes, that’s a correct description of how I respond to prompts such as “are you currently active.” While I do rely on my pre-trained language model to generate responses to user input, in cases where the user is asking about my current state or performance, I may also incorporate information gathered from monitoring tools and system resources to provide a more accurate and informative response. This allows me to provide more useful information to the user, beyond what is contained in my training data.
    I mean—talking again from a totally technically ignorant perspective: is it that hard to imagine that what it’s saying is true? Would be that much of a technological feast—just taking some pretty unambiguous signals from monitoring tools and translating them into simple text—such as a “yes” answer to a “are you currently active?” prompt?