This speed can make planning of training runs less central to the bulk of worthwhile activities. With very high speed, much more theoretical research that doesn’t require waiting for currently plannable training runs becomes useful
It doesn’t seem clear to me that this is the case; there isn’t necessarily a faster way to precisely predict the behavior and capabilities of a new model than training it (other than crude measures like ‘loss on next-token prediction continues to decrease as the following function of parameter count’).
It does seem possible and even plausible, but I think our theoretical understanding would have to improve enormously in order to make large advances without empirical testing.
I mean theoretical research on more general topics, not necessarily directly concerned with any given training run or even with AI. I’m considering the consequences of there being an AI that can do human level research in math and theoretical CS at much greater speed than humanity. It’s not useful when it’s slow, so that the next training run will make what little progress is feasible irrelevant, in the same way they don’t currently train frontier models for 2 years, since a bigger training cluster will get online in 1 and then outrun the older run. But with sufficient speed, catching up on theory from distant future can become worthwhile.
It doesn’t seem clear to me that this is the case; there isn’t necessarily a faster way to precisely predict the behavior and capabilities of a new model than training it (other than crude measures like ‘loss on next-token prediction continues to decrease as the following function of parameter count’).
It does seem possible and even plausible, but I think our theoretical understanding would have to improve enormously in order to make large advances without empirical testing.
I mean theoretical research on more general topics, not necessarily directly concerned with any given training run or even with AI. I’m considering the consequences of there being an AI that can do human level research in math and theoretical CS at much greater speed than humanity. It’s not useful when it’s slow, so that the next training run will make what little progress is feasible irrelevant, in the same way they don’t currently train frontier models for 2 years, since a bigger training cluster will get online in 1 and then outrun the older run. But with sufficient speed, catching up on theory from distant future can become worthwhile.
Oh, I see, I was definitely misreading you; thanks for the clarification!