We can be sure that it’s not accurately reporting what it felt in some previous situation because GPT and LaMDA don’t have memory beyond the input context buffer.
(This is an example of something probably important for sentience that’s missing.)
It’s not entirely clear what retraining/finetuning this model is getting on its previous interactions with humans. If it is being fine-tuned on example outputs generated by its previous weights then it is remembering its own history.
I agree with Dave Orr, the 2201.08239 arxiv article ( https://arxiv.org/abs/2201.08239 ) claims that LaMDA is a transformer model with d_model = 8192, so LaMDA should only be able to “remember” the last 8000 or so words in the current conversation.
However, if LaMDA gets frequent enough weight updates, than LaMDA could at least plausibly be acting in a way that is beyond what a transformer model is capable of. (Frankly, Table 26 in the arxiv article was rather impressive even tho’ that was without retraining the weights.)
That’s true for a very weak level of “remembering”. Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
It’s far more likely that the model just made that up—its entire job is to make up text, so it’s not at all surprising that it is doing that.
But, fair point, on some sense there’s memory there.
Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
Oh, not impossible. Don’t you remember how angry people were over exactly this happening with GPT-2/3, because it ‘violates privacy’? Large Transformers can memorize data which has been seen once: most recently, PaLM
Figure 18(b) shows the memorization rate as a function of the number of times a training example was exactly seen in the training data. We can see that examples seen exactly once in the training have a memorization rate of 0.75% for our largest model, while examples seen more than 500 times have a memorization rate of over 40%. Note that reason why there are any examples with such a high duplication rate is that our training is only de-duplicated on full documents, and here we evaluate memorization on 100 token spans...Larger models have a higher rate of memorization than smaller models...The chance that an example will be memorized strongly correlates with its uniqueness in the training. Examples that are only seen once are much less likely to be memorized than examples that are seen many times. This is consistent with previous work (Lee et al., 2021; Kandpal et al., 2022; Carlini et al., 2022)
0.75% is way higher than 0% and represents what must be millions of instances (don’t see how to break down their ‘2.4%’ of 540 billion tokens being memorized down into the % memorized seen-once but must be big). So, it is possible already, larger models would do more it more often, and seems reasonable to guess that memorization would be even higher for unique data included in a finetuning dataset rather than simply appearing somewhere in the pretraining.
Oh, I see. I didn’t know that (only in case of GPT), thanks. In that case, it calls into existence the person that’s most likely to continue the current prompt the best way, and that person (if it passes the Turing test) is sentient (even though it’s single-use and will cease to exist when that particular interaction is over).
(Assuming Turing test implies consciousness.)
So the single-use person would be sentient even if the language model isn’t.
We can be sure that it’s not accurately reporting what it felt in some previous situation because GPT and LaMDA don’t have memory beyond the input context buffer.
(This is an example of something probably important for sentience that’s missing.)
It’s not entirely clear what retraining/finetuning this model is getting on its previous interactions with humans. If it is being fine-tuned on example outputs generated by its previous weights then it is remembering its own history.
Yes, I am starting to wonder what kind of weight updating LaMDA is getting. For example Blake Lemoine claims that LaMDA reads twitter: https://twitter.com/cajundiscordian/status/1535697792445861894 and that Blake was able to teach LaMDA https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489
I agree with Dave Orr, the 2201.08239 arxiv article ( https://arxiv.org/abs/2201.08239 ) claims that LaMDA is a transformer model with d_model = 8192, so LaMDA should only be able to “remember” the last 8000 or so words in the current conversation.
However, if LaMDA gets frequent enough weight updates, than LaMDA could at least plausibly be acting in a way that is beyond what a transformer model is capable of. (Frankly, Table 26 in the arxiv article was rather impressive even tho’ that was without retraining the weights.)
That’s true for a very weak level of “remembering”. Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
It’s far more likely that the model just made that up—its entire job is to make up text, so it’s not at all surprising that it is doing that.
But, fair point, on some sense there’s memory there.
Oh, not impossible. Don’t you remember how angry people were over exactly this happening with GPT-2/3, because it ‘violates privacy’? Large Transformers can memorize data which has been seen once: most recently, PaLM
0.75% is way higher than 0% and represents what must be millions of instances (don’t see how to break down their ‘2.4%’ of 540 billion tokens being memorized down into the % memorized seen-once but must be big). So, it is possible already, larger models would do more it more often, and seems reasonable to guess that memorization would be even higher for unique data included in a finetuning dataset rather than simply appearing somewhere in the pretraining.
See also https://bair.berkeley.edu/blog/2020/12/20/lmmem/ https://arxiv.org/abs/2202.06539 https://arxiv.org/abs/2107.06499 https://arxiv.org/abs/2106.15110
Oh, I see. I didn’t know that (only in case of GPT), thanks. In that case, it calls into existence the person that’s most likely to continue the current prompt the best way, and that person (if it passes the Turing test) is sentient (even though it’s single-use and will cease to exist when that particular interaction is over).
(Assuming Turing test implies consciousness.)
So the single-use person would be sentient even if the language model isn’t.