Daniel Kokotajlo comments on Meta announces Llama 2; “open sources” it for commercial use

Daniel Kokotajlo 19 Jul 2023 14:45 UTC
LW: 2 AF: 2
0
AF
Any idea what’s happening with the 34B model? Why might it be so much less “safe” than the bigger and smaller versions? And what about the base version of the 34B—are they not releasing that? But the base version isn’t supposed to be “safe” anyway...
- Daniel Kokotajlo 19 Jul 2023 14:54 UTC
  LW: 8 AF: 3
  0
  AF Parent
  Relevant rumors / comments:
  
  Seems like we can continue to scale tokens and get returns model performance well after 2T tokens. : r/LocalLLaMA (reddit.com)
  
  LLaMA 2 is here : r/LocalLLaMA (reddit.com)
  There is something weird going on with the 34B model. See Figure 17 in the the paper. For some reason it’s far less “safe” than the other 3 models.
  Also:
  It’s performance scores are just slightly better than 13B, and not in the middle between 13B and 70B.
  At math, it’s worse than 13B
  It’s trained with 350W GPUs instead of 400W for the other models. The training time also doesn’t scale as expected.
  It’s not in the reward scaling graphs in Figure 6.
  It just slightly beats Vicuna 33B, while the 13B model beats Vicuna 13B easily.
  In Table 14, LLaMA 34B-Chat (finetuned) scores the highest on TruthfulQA, beating the 70B model.
  So I have no idea what exactly, but they did do something different with 34B than with the rest of the models.