There’s two principles I find useful for reasoning about future emergent capabilities:
If a capability would help get lower training loss, it will likely emerge in the future, even if we don’t observe much of it now.
As ML models get larger and are trained on more and better data, simpler heuristics will tend to get replaced by more complex heuristics. . . This points to one general driver of emergence: when one heuristic starts to outcompete another. Usually, a simple heuristic (e.g. answering directly) works best for small models on less data, while more complex heuristics (e.g. chain-of-thought) work better for larger models trained on more data.
The nature of these things is that they’re hard to predict, but general reasoning satisfies both criteria, making it a prime candidate for a capability that will emerge with scale.
I think the trouble with that argument is that it seems equally compelling for any useful capabilities, regardless of how achievable they are (eg it works for ‘deterministically simulate the behavior of the user’ even though that seems awfully unlikely to emerge in the foreseeable future). So I don’t think we can take it as evidence that we’ll see general reasoning emerge at any nearby scale.
Jacob Steinhardt on predicting emergent capabilities:
The nature of these things is that they’re hard to predict, but general reasoning satisfies both criteria, making it a prime candidate for a capability that will emerge with scale.
I think the trouble with that argument is that it seems equally compelling for any useful capabilities, regardless of how achievable they are (eg it works for ‘deterministically simulate the behavior of the user’ even though that seems awfully unlikely to emerge in the foreseeable future). So I don’t think we can take it as evidence that we’ll see general reasoning emerge at any nearby scale.