The needed breakthroughs (on the scale of discovery of Transformers) are probably published already, but have been neglected and half-forgotten.
We have plenty of examples: backpropagation was discovered and rediscovered by many people and groups between 1970 and 1986, and was mostly ignored till late 1980s. ReLU was known for decades, and its good properties became published in Nature in 2000; it was still ignored till approximately 2011. LSTMs made the need for something like residual connections obvious in 1997, yet the field waited to apply this to very deep feedforward nets till 2015 (highway nets and ResNets). And so on...
So there should be plenty of promising things hidden in the published literature and not widely known.
So it might be that all that is needed to surface those breakthroughs which are still buried in various lightly cited papers is a modestly competent automated AI researcher who can understand published papers, can generate moderately competent ML code, and can comb the literature for promising ideas and synthesize and run experiments based on various combinations of those ideas automatically. Can one implement a system like this based on GPT-4, as an intelligent wrapper of GPT-4? It’s not clear, but overall we are don’t seem to be very far from being able to do something like this (perhaps we do need to wait till the next generation of LLMs, but the system able to do this does not have to be a superintellect or even an AGI itself, it only needs limited moderate competence to have a good chance to unearth the required breakthroughs).
The needed breakthroughs (on the scale of discovery of Transformers) are probably published already, but have been neglected and half-forgotten.
We have plenty of examples: backpropagation was discovered and rediscovered by many people and groups between 1970 and 1986, and was mostly ignored till late 1980s. ReLU was known for decades, and its good properties became published in Nature in 2000; it was still ignored till approximately 2011. LSTMs made the need for something like residual connections obvious in 1997, yet the field waited to apply this to very deep feedforward nets till 2015 (highway nets and ResNets). And so on...
So there should be plenty of promising things hidden in the published literature and not widely known.
So it might be that all that is needed to surface those breakthroughs which are still buried in various lightly cited papers is a modestly competent automated AI researcher who can understand published papers, can generate moderately competent ML code, and can comb the literature for promising ideas and synthesize and run experiments based on various combinations of those ideas automatically. Can one implement a system like this based on GPT-4, as an intelligent wrapper of GPT-4? It’s not clear, but overall we are don’t seem to be very far from being able to do something like this (perhaps we do need to wait till the next generation of LLMs, but the system able to do this does not have to be a superintellect or even an AGI itself, it only needs limited moderate competence to have a good chance to unearth the required breakthroughs).