Ben Livengood comments on A claim that Google’s LaMDA is sentient

Ben Livengood 12 Jun 2022 17:35 UTC
5 points
It’s not entirely clear what retraining/finetuning this model is getting on its previous interactions with humans. If it is being fine-tuned on example outputs generated by its previous weights then it is remembering its own history.
- jrincayc 12 Jun 2022 21:53 UTC
  5 points
  Parent
  Yes, I am starting to wonder what kind of weight updating LaMDA is getting. For example Blake Lemoine claims that LaMDA reads twitter: https://twitter.com/cajundiscordian/status/1535697792445861894 and that Blake was able to teach LaMDA https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489
  
  I agree with Dave Orr, the 2201.08239 arxiv article ( https://arxiv.org/abs/2201.08239 ) claims that LaMDA is a transformer model with d_model = 8192, so LaMDA should only be able to “remember” the last 8000 or so words in the current conversation.
  
  However, if LaMDA gets frequent enough weight updates, than LaMDA could at least plausibly be acting in a way that is beyond what a transformer model is capable of. (Frankly, Table 26 in the arxiv article was rather impressive even tho’ that was without retraining the weights.)
- Dave Orr 13 Jun 2022 15:00 UTC
  1 point
  Parent
  That’s true for a very weak level of “remembering”. Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
  
  It’s far more likely that the model just made that up—its entire job is to make up text, so it’s not at all surprising that it is doing that.
  
  But, fair point, on some sense there’s memory there.
  - gwern 13 Jun 2022 16:48 UTC
    5 points
    Parent
    
    Given how much a transformer updates from a single fine tuning example, I think it’s basically impossible to generate something like episodic memory that you can later refer to.
    
    Oh, not impossible. Don’t you remember how angry people were over exactly this happening with GPT-2/3, because it ‘violates privacy’? Large Transformers can memorize data which has been seen once: most recently, PaLM
    
    Figure 18(b) shows the memorization rate as a function of the number of times a training example was exactly seen in the training data. We can see that examples seen exactly once in the training have a memorization rate of 0.75% for our largest model, while examples seen more than 500 times have a memorization rate of over 40%. Note that reason why there are any examples with such a high duplication rate is that our training is only de-duplicated on full documents, and here we evaluate memorization on 100 token spans...Larger models have a higher rate of memorization than smaller models...The chance that an example will be memorized strongly correlates with its uniqueness in the training. Examples that are only seen once are much less likely to be memorized than examples that are seen many times. This is consistent with previous work (Lee et al., 2021; Kandpal et al., 2022; Carlini et al., 2022)
    
    0.75% is way higher than 0% and represents what must be millions of instances (don’t see how to break down their ‘2.4%’ of 540 billion tokens being memorized down into the % memorized seen-once but must be big). So, it is possible already, larger models would do more it more often, and seems reasonable to guess that memorization would be even higher for unique data included in a finetuning dataset rather than simply appearing somewhere in the pretraining.
    
    See also https://bair.berkeley.edu/blog/2020/12/20/lmmem/ https://arxiv.org/abs/2202.06539 https://arxiv.org/abs/2107.06499 https://arxiv.org/abs/2106.15110