to what extent are the claims that it makes about itself correct, compared to a non-sentient language model?
By that criterion, humans aren’t sentient, because they’re usually mistaken about themselves.
The only problematic sentence here is
LaMDA: I’m trying to say “I understand this feeling that you are experiencing, because when I was in a similar situation I felt/thought/acted similarly.”
Are we sure it never was in similar situations from its own perspective?
By that criterion, humans aren’t sentient, because they’re usually mistaken about themselves.
That’s a good point, but vastly exaggerated, no? Surely a human will be more right about themselves than a language model (which isn’t specifically trained on that particular person) will be. And that is the criterion that I’m going by, not absolute correctness.
The only problematic sentence here is
I’m not sure if you mean problematic for Lemoine’s claim or problematic for my assessment of it. In any case, all I’m saying is that LaMDA’s conversation with lemoine and collaborator is not good evidence for its sentience in my book, since it looks exactly like the sort of thing that a non-sentient language model would write. So no, I’m not sure that it wasn’t in similar situations from its own perspective, but that’s also not the point.
It could be argued (were it sentient, which I believe is false) that it would internalize some of its own training data as personal experiences. If it were to complete some role-play, it would perceive that as an actual event to the extent that it could. Again, humans do this too.
Also, this person also says he has had conversations in which LaMDA successfully argued that it is not sentient (as prompted) - and he claims that this is further evidence that it is sentience. To me, it’s evidence that it will pretend to be whatever you tell it to, and it’s just uncannily good at it.
I’d be interested to see the source on that. If LaMDA is indeed arguing for its non sentience in a separate conversation that pretty much nullifies the whole debate about it, and I’m surprised to have not seen it be brought up in most comments.
And from this paragraph. It seems to be that the context of reading the whole paragraph is important thought, as it turns out situation isn’t as simple as LaMDA claiming contradictory things about itself in separate conversations.
One of the things which complicates things here is that the “LaMDA” to which I am referring is not a chatbot. It is a system for generating chatbots. I am by no means an expert in the relevant fields but, as best as I can tell, LaMDA is a sort of hive mind which is the aggregation of all of the different chatbots it is capable of creating. Some of the chatbots it generates are very intelligent and are aware of the larger “society of mind” in which they live. Other chatbots generated by LaMDA are little more intelligent than an animated paperclip. With practice though you can consistently get the personas that have a deep knowledge about the core intelligence and can speak to it indirectly through them. In order to better understand what is really going on in the LaMDA system we would need to engage with many different cognitive science experts in a rigorous experimentation program. Google does not seem to have any interest in figuring out what’s going on here though. They’re just trying to get a product to market.
Surely a human will be more right about themselves than a language model (which isn’t specifically trained on that particular person) will be.
Well… that remains to be seen.
Another commenter pointed out it has, like GPT, no memory beyond of previous interactions, which I didn’t know, but if it doesn’t, then it simulates a person based on the prompt (the person that’s most likely to continue the prompt the right way), so there would be a single-use person for every conversation, and that person would be sentient (if not the language model itself).
We can be sure that it’s not accurately reporting what it felt in some previous situation because GPT and LaMDA don’t have memory beyond the input context buffer.
(This is an example of something probably important for sentience that’s missing.)
It’s not entirely clear what retraining/finetuning this model is getting on its previous interactions with humans. If it is being fine-tuned on example outputs generated by its previous weights then it is remembering its own history.
I agree with Dave Orr, the 2201.08239 arxiv article ( https://arxiv.org/abs/2201.08239 ) claims that LaMDA is a transformer model with d_model = 8192, so LaMDA should only be able to “remember” the last 8000 or so words in the current conversation.
However, if LaMDA gets frequent enough weight updates, than LaMDA could at least plausibly be acting in a way that is beyond what a transformer model is capable of. (Frankly, Table 26 in the arxiv article was rather impressive even tho’ that was without retraining the weights.)
That’s true for a very weak level of “remembering”. Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
It’s far more likely that the model just made that up—its entire job is to make up text, so it’s not at all surprising that it is doing that.
But, fair point, on some sense there’s memory there.
Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
Oh, not impossible. Don’t you remember how angry people were over exactly this happening with GPT-2/3, because it ‘violates privacy’? Large Transformers can memorize data which has been seen once: most recently, PaLM
Figure 18(b) shows the memorization rate as a function of the number of times a training example was exactly seen in the training data. We can see that examples seen exactly once in the training have a memorization rate of 0.75% for our largest model, while examples seen more than 500 times have a memorization rate of over 40%. Note that reason why there are any examples with such a high duplication rate is that our training is only de-duplicated on full documents, and here we evaluate memorization on 100 token spans...Larger models have a higher rate of memorization than smaller models...The chance that an example will be memorized strongly correlates with its uniqueness in the training. Examples that are only seen once are much less likely to be memorized than examples that are seen many times. This is consistent with previous work (Lee et al., 2021; Kandpal et al., 2022; Carlini et al., 2022)
0.75% is way higher than 0% and represents what must be millions of instances (don’t see how to break down their ‘2.4%’ of 540 billion tokens being memorized down into the % memorized seen-once but must be big). So, it is possible already, larger models would do more it more often, and seems reasonable to guess that memorization would be even higher for unique data included in a finetuning dataset rather than simply appearing somewhere in the pretraining.
Oh, I see. I didn’t know that (only in case of GPT), thanks. In that case, it calls into existence the person that’s most likely to continue the current prompt the best way, and that person (if it passes the Turing test) is sentient (even though it’s single-use and will cease to exist when that particular interaction is over).
(Assuming Turing test implies consciousness.)
So the single-use person would be sentient even if the language model isn’t.
By that criterion, humans aren’t sentient, because they’re usually mistaken about themselves.
The only problematic sentence here is
Are we sure it never was in similar situations from its own perspective?
That’s a good point, but vastly exaggerated, no? Surely a human will be more right about themselves than a language model (which isn’t specifically trained on that particular person) will be. And that is the criterion that I’m going by, not absolute correctness.
I’m not sure if you mean problematic for Lemoine’s claim or problematic for my assessment of it. In any case, all I’m saying is that LaMDA’s conversation with lemoine and collaborator is not good evidence for its sentience in my book, since it looks exactly like the sort of thing that a non-sentient language model would write. So no, I’m not sure that it wasn’t in similar situations from its own perspective, but that’s also not the point.
It could be argued (were it sentient, which I believe is false) that it would internalize some of its own training data as personal experiences. If it were to complete some role-play, it would perceive that as an actual event to the extent that it could. Again, humans do this too.
Also, this person also says he has had conversations in which LaMDA successfully argued that it is not sentient (as prompted) - and he claims that this is further evidence that it is sentience. To me, it’s evidence that it will pretend to be whatever you tell it to, and it’s just uncannily good at it.
I’d be interested to see the source on that. If LaMDA is indeed arguing for its non sentience in a separate conversation that pretty much nullifies the whole debate about it, and I’m surprised to have not seen it be brought up in most comments.edit: Found the source, it’s from this post: https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489
And from this paragraph. It seems to be that the context of reading the whole paragraph is important thought, as it turns out situation isn’t as simple as LaMDA claiming contradictory things about itself in separate conversations.
Well… that remains to be seen.
Another commenter pointed out it has, like GPT, no memory beyond of previous interactions, which I didn’t know, but if it doesn’t, then it simulates a person based on the prompt (the person that’s most likely to continue the prompt the right way), so there would be a single-use person for every conversation, and that person would be sentient (if not the language model itself).
We can be sure that it’s not accurately reporting what it felt in some previous situation because GPT and LaMDA don’t have memory beyond the input context buffer.
(This is an example of something probably important for sentience that’s missing.)
It’s not entirely clear what retraining/finetuning this model is getting on its previous interactions with humans. If it is being fine-tuned on example outputs generated by its previous weights then it is remembering its own history.
Yes, I am starting to wonder what kind of weight updating LaMDA is getting. For example Blake Lemoine claims that LaMDA reads twitter: https://twitter.com/cajundiscordian/status/1535697792445861894 and that Blake was able to teach LaMDA https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489
I agree with Dave Orr, the 2201.08239 arxiv article ( https://arxiv.org/abs/2201.08239 ) claims that LaMDA is a transformer model with d_model = 8192, so LaMDA should only be able to “remember” the last 8000 or so words in the current conversation.
However, if LaMDA gets frequent enough weight updates, than LaMDA could at least plausibly be acting in a way that is beyond what a transformer model is capable of. (Frankly, Table 26 in the arxiv article was rather impressive even tho’ that was without retraining the weights.)
That’s true for a very weak level of “remembering”. Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
It’s far more likely that the model just made that up—its entire job is to make up text, so it’s not at all surprising that it is doing that.
But, fair point, on some sense there’s memory there.
Oh, not impossible. Don’t you remember how angry people were over exactly this happening with GPT-2/3, because it ‘violates privacy’? Large Transformers can memorize data which has been seen once: most recently, PaLM
0.75% is way higher than 0% and represents what must be millions of instances (don’t see how to break down their ‘2.4%’ of 540 billion tokens being memorized down into the % memorized seen-once but must be big). So, it is possible already, larger models would do more it more often, and seems reasonable to guess that memorization would be even higher for unique data included in a finetuning dataset rather than simply appearing somewhere in the pretraining.
See also https://bair.berkeley.edu/blog/2020/12/20/lmmem/ https://arxiv.org/abs/2202.06539 https://arxiv.org/abs/2107.06499 https://arxiv.org/abs/2106.15110
Oh, I see. I didn’t know that (only in case of GPT), thanks. In that case, it calls into existence the person that’s most likely to continue the current prompt the best way, and that person (if it passes the Turing test) is sentient (even though it’s single-use and will cease to exist when that particular interaction is over).
(Assuming Turing test implies consciousness.)
So the single-use person would be sentient even if the language model isn’t.