I think it is too early to know how many phase transitions there are in e.g. the training of a large language model. If there are many, it seems likely to me that they fall along a spectrum of “scale” and that it will be easier to find the more significant ones than the less significant ones (e.g. we discover transitions like the onset of in-context learning first, because they dramatically change how the whole network computes).
As evidence for that view, I would put forward the fact that putting features into superposition is known to be a phase transition in toy models (based on the original post by Elhage et al and also our work in Chen et al) and therefore seems likely to be a phase transition in larger models as well. That gives an example of phase transitions at the “small” end of the scale. At the “big” end of the scale, the evidence in Olsson et al that induction heads and in-context learning appears in a phase transition seems convincing to me.
On general principles, understanding “small” phase transitions (where the scale is judged relative to the overall size of the system, e.g. number of parameters) is like probing a physical system at small length scales / high energy, and will require more sophisticated tools. So I expect that we’ll start by gaining a good understanding of “big” phase transitions and then as the experimental methodology and theory improves, move down the spectrum towards smaller transitions.
On these grounds I don’t expect us to be swamped by the smaller transitions, because they’re just hard to see in the first place; the major open problem in my mind is how far we can get down the scale with reasonable amounts of compute. Maybe one way that SLT & developmental interpretability fails to be useful for alignment is if there is a large “gap” in the spectrum, where beyond the “big” phase transitions that are easy to see (and for which you may not need fancy new ideas) there is just a desert / lack of transitions, and all the transitions that matter for alignment are “small” enough that a lot of compute and/or very sophisticated ideas are necessary to study them. We’ll see!
Thank you, that was helpful. If I’m getting this right, you think the “big” transitions plausibly correspond to important capability gains. So under that theory, “chain of thought” and “reflection” arised due to big phase transitions in GPT-3 and 4. I think it’d be great if researchers could, if not access training checkpoints of these models, then at least make bids for experiments to be performed on said models.
I think it is too early to know how many phase transitions there are in e.g. the training of a large language model. If there are many, it seems likely to me that they fall along a spectrum of “scale” and that it will be easier to find the more significant ones than the less significant ones (e.g. we discover transitions like the onset of in-context learning first, because they dramatically change how the whole network computes).
As evidence for that view, I would put forward the fact that putting features into superposition is known to be a phase transition in toy models (based on the original post by Elhage et al and also our work in Chen et al) and therefore seems likely to be a phase transition in larger models as well. That gives an example of phase transitions at the “small” end of the scale. At the “big” end of the scale, the evidence in Olsson et al that induction heads and in-context learning appears in a phase transition seems convincing to me.
On general principles, understanding “small” phase transitions (where the scale is judged relative to the overall size of the system, e.g. number of parameters) is like probing a physical system at small length scales / high energy, and will require more sophisticated tools. So I expect that we’ll start by gaining a good understanding of “big” phase transitions and then as the experimental methodology and theory improves, move down the spectrum towards smaller transitions.
On these grounds I don’t expect us to be swamped by the smaller transitions, because they’re just hard to see in the first place; the major open problem in my mind is how far we can get down the scale with reasonable amounts of compute. Maybe one way that SLT & developmental interpretability fails to be useful for alignment is if there is a large “gap” in the spectrum, where beyond the “big” phase transitions that are easy to see (and for which you may not need fancy new ideas) there is just a desert / lack of transitions, and all the transitions that matter for alignment are “small” enough that a lot of compute and/or very sophisticated ideas are necessary to study them. We’ll see!
Thank you, that was helpful. If I’m getting this right, you think the “big” transitions plausibly correspond to important capability gains. So under that theory, “chain of thought” and “reflection” arised due to big phase transitions in GPT-3 and 4. I think it’d be great if researchers could, if not access training checkpoints of these models, then at least make bids for experiments to be performed on said models.
That’s what we’re thinking, yeah.