That intuition sounds reasonable to me, but I don’t have strong opinions about it.
One thing to note is that training and test performance are lagging indicators of phase transitions. In our limited experience so far, measures such as the RLCT do seem to indicate that a transition is underway earlier (e.g. in Toy Models of Superposition), but in the scenario you describe I don’t know if it’s early enough to detect structure formation “when it starts”.
For what it’s worth my guess is that the information you need to understand the structure is present at the transition itself, and you don’t need to “rewind” SGD to examine the structure forming one step at a time.
That intuition sounds reasonable to me, but I don’t have strong opinions about it.
One thing to note is that training and test performance are lagging indicators of phase transitions. In our limited experience so far, measures such as the RLCT do seem to indicate that a transition is underway earlier (e.g. in Toy Models of Superposition), but in the scenario you describe I don’t know if it’s early enough to detect structure formation “when it starts”.
For what it’s worth my guess is that the information you need to understand the structure is present at the transition itself, and you don’t need to “rewind” SGD to examine the structure forming one step at a time.