bayesian_kitten comments on Evidence Sets: Towards Inductive-Biases based Analysis of Prosaic AGI

bayesian_kitten 23 Dec 2021 12:35 UTC
1 point
This post (which is really dope) provides some grokking examples in large language models in a Big-Bench video at 19313s & 19458s, with that segment (18430s-19650s) being a nice watch! I shall spend a bit more time collecting and precisely identifying evidence and then include it in the grokking part of this post. This was a really nice thing to know about and very suprising.
- gwern 23 Dec 2021 17:32 UTC
  3 points
  Parent
  I’ve commented on that, but I’m not convinced that the phase transitions in learning are grokking, per se. There are many different scaling phenomenon, and we shouldn’t go around prematurely conflating them.