Hi, I just wanted to say thanks for the comment / feedback. Yeah, I probably should have separated out the analysis of Grokking from the analysis of emergent behaviour during scaling. They are potentially related—at least for many tasks it seems Grokking becomes more likely as the model gets bigger. I’m guilty of actually conflating the two phenomena in some of my thinking, admittedly.
Your point about “fragile metrics” being more likely to show Grokking great. I had a similar thought, too.
Hi, I just wanted to say thanks for the comment / feedback. Yeah, I probably should have separated out the analysis of Grokking from the analysis of emergent behaviour during scaling. They are potentially related—at least for many tasks it seems Grokking becomes more likely as the model gets bigger. I’m guilty of actually conflating the two phenomena in some of my thinking, admittedly.
Your point about “fragile metrics” being more likely to show Grokking great. I had a similar thought, too.