The strongest critique of developmental interpretability we know is the following: while it is established that phase transitions exist in neural network training, it is not yet clear how common they are, and whether they make a good target for alignment.
Is it established that phase transitions exist in the training of non-toy neural networks?
There are clearly s-shaped loss curves in many non-toy cases, but I’m not aware of any known cases which are clearly phase transitions as defined here (which is how the term is commonly used in e.g. physics and how I think this post wants to use the term).
For instance, while formation of induction-like attention heads[1] probably results in s-shaped loss curves in at least some cases, my understanding is that this probably has nothing to do with changes in the minima of some notion of energy (as would be required for the definition linked above I think). I think the effect is probably the one described in Multi-Component Learning and S-Curves. Unless there is some notion of energy such that this multi-component case of s-shaped loss curves is well described as a phase transition and that’s what’s discussed in this post?
Some important disclaimers:
I’m not very familiar with the notion of phase transition I think this post is using.
It seems as though there are cases where real phase transitions occur during the training of toy models and there are also phase transitions in the final configuration of toy models as various hyperparameters change. (I’m pretty confident that Toy models of superposition has both phase transitions during training and in final configurations (for phase transitions in final configurations there seems to be one with respect to “does superposition occur”).There is also this example of a two layer ReLU model; however, I haven’t read or understood this example, I’m just trusting the authors claim that there is a phase transition here).
Great question, thanks. tldr it depends what you mean by established, probably the obstacle to establishing such a thing is lower than you think.
To clarify the two types of phase transitions involved here, in the terminology of Chen et al:
Bayesian phase transition in number of samples: as discussed in the post you link to in Liam’s sequence, where the concentration of the Bayesian posterior shifts suddenly from one region of parameter space to another, as the number of samples increased past some critical sample size n. There are also Bayesian phase transitions with respect to hyperparameters (such as variations in the true distribution) but those are not what we’re talking about here.
Dynamical phase transitions: the “backwards S-shaped loss curve”. I don’t believe there is an agreed-upon formal definition of what people mean by this kind of phase transition in the deep learning literature, but what we mean by it is that the SGD trajectory is for some time strongly influenced (e.g. in the neighbourhood of) a critical point w∗α and then strongly influenced by another critical point w∗β. In the clearest case there are two plateaus, the one with higher loss corresponding to the label α and the one with the lower loss corresponding to β. In larger systems there may not be a clear plateau (e.g. in the case of induction heads that you mention) but it may still reasonable to think of the trajectory as dominated by the critical points.
The former kind of phase transition is a first-order phase transition in the sense of statistical physics, once you relate the posterior to a Boltzmann distribution. The latter is a notion that belongs more to the theory of dynamical systems or potentially catastrophe theory. The link between these two notions is, as you say, not obvious.
However Singular Learning Theory (SLT) does provide a link, which we explore in Chen et al. SLT says that the phases of Bayesian learning are also dominated by critical points of the loss, and so you can ask whether a given dynamical phase transition α→β has “standing behind it” a Bayesian phase transition where at some critical sample size the posterior shifts from being concentrated near w∗α to being concentrated near w∗β.
It turns out that, at least for sufficiently large n, the only real obstruction to this Bayesian phase transition existing is that the local learning coefficient near w∗β should be higherthan near w∗α. This will be hard to prove theoretically in non-toy systems, but we can estimate the local learning coefficient, compare them, and thereby provide evidence that a Bayesian phase transition exists.
This has been done in the Toy Model of Superposition in Chen et al, and we’re in the process of looking at a range of larger systems including induction heads. We’re not ready to share those results yet, but I would point you to Nina Rimsky and Dmitry Vaintrob’s nice post on modular addition which I would say provides evidence for a Bayesian phase transition in that setting.
There are some caveats and details, that I can go into if you’re interested. I would say the existence of Bayesian phase transitions in non-toy neural networks is not established yet, but at this point I think we can be reasonably confident they exist.
The toy cases discussed in Multi-Component Learning and S-Curves are clearly dynamical phase transitions. (It’s easy to establish dynamical phase transitions based on just observation in general. And, in these cases we can verify this property holds for the corresponding differential equations (and step size is unimportant so differential equations are a good model).) Also, I speculate it’s easy to prove the existence of a bayesian phase transition in the number of samples for these toy cases given how simple they are.
Yes I think that’s right. I haven’t closely read the post you link to (but it’s interesting and I’m glad to have it brought to my attention, thanks) but it seems related to the kind of dynamical transitions we talk briefly about in the Related Works section of Chen et al.
More generally, I wish that when people used the term “phase transition”, they clarified whether they meant “s-shaped loss curves” or some more precise notion. Often, people are making a non-mechanistic claim when they say “phase transition” (we observed a loss curve with a s-shape), but there are also mechanistic claims which require additional evidence.
In particular, when citing other work somewhere, it would be nice to clarify what notion of phase transition the other work is discussing.
Is it established that phase transitions exist in the training of non-toy neural networks?
There are clearly s-shaped loss curves in many non-toy cases, but I’m not aware of any known cases which are clearly phase transitions as defined here (which is how the term is commonly used in e.g. physics and how I think this post wants to use the term).
For instance, while formation of induction-like attention heads[1] probably results in s-shaped loss curves in at least some cases, my understanding is that this probably has nothing to do with changes in the minima of some notion of energy (as would be required for the definition linked above I think). I think the effect is probably the one described in Multi-Component Learning and S-Curves. Unless there is some notion of energy such that this multi-component case of s-shaped loss curves is well described as a phase transition and that’s what’s discussed in this post?
Some important disclaimers:
I’m not very familiar with the notion of phase transition I think this post is using.
It seems as though there are cases where real phase transitions occur during the training of toy models and there are also phase transitions in the final configuration of toy models as various hyperparameters change. (I’m pretty confident that Toy models of superposition has both phase transitions during training and in final configurations (for phase transitions in final configurations there seems to be one with respect to “does superposition occur”).There is also this example of a two layer ReLU model; however, I haven’t read or understood this example, I’m just trusting the authors claim that there is a phase transition here).
These attention heads probably do a bunch of stuff which isn’t that well described as induction, so I’m reluctant to call them “induction heads”.
Great question, thanks. tldr it depends what you mean by established, probably the obstacle to establishing such a thing is lower than you think.
To clarify the two types of phase transitions involved here, in the terminology of Chen et al:
Bayesian phase transition in number of samples: as discussed in the post you link to in Liam’s sequence, where the concentration of the Bayesian posterior shifts suddenly from one region of parameter space to another, as the number of samples increased past some critical sample size n. There are also Bayesian phase transitions with respect to hyperparameters (such as variations in the true distribution) but those are not what we’re talking about here.
Dynamical phase transitions: the “backwards S-shaped loss curve”. I don’t believe there is an agreed-upon formal definition of what people mean by this kind of phase transition in the deep learning literature, but what we mean by it is that the SGD trajectory is for some time strongly influenced (e.g. in the neighbourhood of) a critical point w∗α and then strongly influenced by another critical point w∗β. In the clearest case there are two plateaus, the one with higher loss corresponding to the label α and the one with the lower loss corresponding to β. In larger systems there may not be a clear plateau (e.g. in the case of induction heads that you mention) but it may still reasonable to think of the trajectory as dominated by the critical points.
The former kind of phase transition is a first-order phase transition in the sense of statistical physics, once you relate the posterior to a Boltzmann distribution. The latter is a notion that belongs more to the theory of dynamical systems or potentially catastrophe theory. The link between these two notions is, as you say, not obvious.
However Singular Learning Theory (SLT) does provide a link, which we explore in Chen et al. SLT says that the phases of Bayesian learning are also dominated by critical points of the loss, and so you can ask whether a given dynamical phase transition α→β has “standing behind it” a Bayesian phase transition where at some critical sample size the posterior shifts from being concentrated near w∗α to being concentrated near w∗β.
It turns out that, at least for sufficiently large n, the only real obstruction to this Bayesian phase transition existing is that the local learning coefficient near w∗β should be higher than near w∗α. This will be hard to prove theoretically in non-toy systems, but we can estimate the local learning coefficient, compare them, and thereby provide evidence that a Bayesian phase transition exists.
This has been done in the Toy Model of Superposition in Chen et al, and we’re in the process of looking at a range of larger systems including induction heads. We’re not ready to share those results yet, but I would point you to Nina Rimsky and Dmitry Vaintrob’s nice post on modular addition which I would say provides evidence for a Bayesian phase transition in that setting.
There are some caveats and details, that I can go into if you’re interested. I would say the existence of Bayesian phase transitions in non-toy neural networks is not established yet, but at this point I think we can be reasonably confident they exist.
Thanks for the detailed response!
So, to check my understanding:
The toy cases discussed in Multi-Component Learning and S-Curves are clearly dynamical phase transitions. (It’s easy to establish dynamical phase transitions based on just observation in general. And, in these cases we can verify this property holds for the corresponding differential equations (and step size is unimportant so differential equations are a good model).) Also, I speculate it’s easy to prove the existence of a bayesian phase transition in the number of samples for these toy cases given how simple they are.
Yes I think that’s right. I haven’t closely read the post you link to (but it’s interesting and I’m glad to have it brought to my attention, thanks) but it seems related to the kind of dynamical transitions we talk briefly about in the Related Works section of Chen et al.
More generally, I wish that when people used the term “phase transition”, they clarified whether they meant “s-shaped loss curves” or some more precise notion. Often, people are making a non-mechanistic claim when they say “phase transition” (we observed a loss curve with a s-shape), but there are also mechanistic claims which require additional evidence.
In particular, when citing other work somewhere, it would be nice to clarify what notion of phase transition the other work is discussing.