The picture of phase changes from your post, as well as the theoretical analysis here, both suggest that you may be able to observe capabilities as they form if you know what to look for. It seems like a kind of similar situation to the one suggested in the OP, though I think with a different underlying mechanism (in the OP I think there probably is no similar phase change in the model itself for any of these tasks) and a different set of things to measure.
In general if we get taken by surprise by a capability, it seems fairly likely that the story in retrospect would be that we just didn’t know what to measure. So for people worried about rapid emergence it seems natural to try to get really good at predicting these kinds of abrupt changes, whether they are coming from phase changes in the model, non-convexities from RL exploration (which exhibit hyperbolic phase changes), or performance measurements that elide progress.
(My guess would be that deception is significantly smoother than any of the trends discussed here, which are themselves significantly smoother than bona fide phase changes, just because larger and larger combinations tend to get smoother and smoother. But it still seems possible to get taken by surprise especially if you aren’t being very careful.)
The picture of phase changes from your post, as well as the theoretical analysis here, both suggest that you may be able to observe capabilities as they form if you know what to look for. It seems like a kind of similar situation to the one suggested in the OP, though I think with a different underlying mechanism (in the OP I think there probably is no similar phase change in the model itself for any of these tasks) and a different set of things to measure.
In general if we get taken by surprise by a capability, it seems fairly likely that the story in retrospect would be that we just didn’t know what to measure. So for people worried about rapid emergence it seems natural to try to get really good at predicting these kinds of abrupt changes, whether they are coming from phase changes in the model, non-convexities from RL exploration (which exhibit hyperbolic phase changes), or performance measurements that elide progress.
(My guess would be that deception is significantly smoother than any of the trends discussed here, which are themselves significantly smoother than bona fide phase changes, just because larger and larger combinations tend to get smoother and smoother. But it still seems possible to get taken by surprise especially if you aren’t being very careful.)