I love this work! It’s really cool to see interpretability on toy models in such a clear way.
The trend from memorization to generalization reminds me of the information bottleneck idea. I don’t know that much about it (read this Quanta article a while ago), but they appear to be making a similar claim about phase transitions. I believe this is the paper one would want to read to get a deeper understanding of it.
I love this work! It’s really cool to see interpretability on toy models in such a clear way.
The trend from memorization to generalization reminds me of the information bottleneck idea. I don’t know that much about it (read this Quanta article a while ago), but they appear to be making a similar claim about phase transitions. I believe this is the paper one would want to read to get a deeper understanding of it.