Do you expect to be able to discover conditions that later lead to phase transitions before the transitions happen? E.g., in grokking, do you expect to be able to look at a neural network and see the algorithm that it will “grok” during a future phase transition?
(And if you do, do you think it won’t help advance capabilities?)
[Since the mech interp of grokking results, I’ve had a speculative intuition that in the high-dimensional space of an LLM, the gradient partially points towards some algorithms that start being implemented in superposition but don’t play much weight until they’re close enough to the correct algorithm that’s being slowly implemented for the further tuning of the weights to produce a meaningful boost in performance, which leads to a rapid change, where the algorithm gets fully implemented and at the same time its outputs start playing a role. If you can only see the transition when the algorithm is already implemented accurately enough for further adjustments to improve performance, you miss most of the subtle process of the algorithm implementation.]
That intuition sounds reasonable to me, but I don’t have strong opinions about it.
One thing to note is that training and test performance are lagging indicators of phase transitions. In our limited experience so far, measures such as the RLCT do seem to indicate that a transition is underway earlier (e.g. in Toy Models of Superposition), but in the scenario you describe I don’t know if it’s early enough to detect structure formation “when it starts”.
For what it’s worth my guess is that the information you need to understand the structure is present at the transition itself, and you don’t need to “rewind” SGD to examine the structure forming one step at a time.
Seems good to ask this publicly as well:
Do you expect to be able to discover conditions that later lead to phase transitions before the transitions happen? E.g., in grokking, do you expect to be able to look at a neural network and see the algorithm that it will “grok” during a future phase transition?
(And if you do, do you think it won’t help advance capabilities?)
[Since the mech interp of grokking results, I’ve had a speculative intuition that in the high-dimensional space of an LLM, the gradient partially points towards some algorithms that start being implemented in superposition but don’t play much weight until they’re close enough to the correct algorithm that’s being slowly implemented for the further tuning of the weights to produce a meaningful boost in performance, which leads to a rapid change, where the algorithm gets fully implemented and at the same time its outputs start playing a role. If you can only see the transition when the algorithm is already implemented accurately enough for further adjustments to improve performance, you miss most of the subtle process of the algorithm implementation.]
That intuition sounds reasonable to me, but I don’t have strong opinions about it.
One thing to note is that training and test performance are lagging indicators of phase transitions. In our limited experience so far, measures such as the RLCT do seem to indicate that a transition is underway earlier (e.g. in Toy Models of Superposition), but in the scenario you describe I don’t know if it’s early enough to detect structure formation “when it starts”.
For what it’s worth my guess is that the information you need to understand the structure is present at the transition itself, and you don’t need to “rewind” SGD to examine the structure forming one step at a time.