There are some fairly straightforward limitations on the types of algorithms that can be learned by current deep learning (look at TLM performance on variable-length arithmetic for a clear-cut example of basic functionality that these networks totally fail at) that would severely handicap a would-be superintelligence in any number of ways. There is a reason DeepMind programs MCTS into AlphaZero rather than simply having the network learn its own search algorithm in the weights—because MCTS is not in the region of algorithm space existing neural networks have access to.
I am pretty confident that you can’t get to AGI via just continuing to make these models bigger or by coming up with new Transformer-level architectural tricks. Going from GPT-2 to GPT-3 did not significantly improve performance on generalizing to new length arithmetic problems and there are strong theoretical reasons why it couldn’t possibly have done so.
That’s the good news, in my book.
The bad news is that I have absolutely no idea how hard those algorithmic limits are to solve, and there hasn’t been a serious push by the DL community to address them because until recently the problems of focus outside of RL haven’t required it. Maybe we’ll hit an AI-winter level wall on the way there. Maybe one big research paper comes out and all hell breaks loose as all of these systems unlock tremendous new capabilities overnight.
Hard to say. The performance you can get without access to wide swaths of the algorithmic space is scary as hell.
I’d like to push back on this a little.
There are some fairly straightforward limitations on the types of algorithms that can be learned by current deep learning (look at TLM performance on variable-length arithmetic for a clear-cut example of basic functionality that these networks totally fail at) that would severely handicap a would-be superintelligence in any number of ways. There is a reason DeepMind programs MCTS into AlphaZero rather than simply having the network learn its own search algorithm in the weights—because MCTS is not in the region of algorithm space existing neural networks have access to.
I am pretty confident that you can’t get to AGI via just continuing to make these models bigger or by coming up with new Transformer-level architectural tricks. Going from GPT-2 to GPT-3 did not significantly improve performance on generalizing to new length arithmetic problems and there are strong theoretical reasons why it couldn’t possibly have done so.
That’s the good news, in my book.
The bad news is that I have absolutely no idea how hard those algorithmic limits are to solve, and there hasn’t been a serious push by the DL community to address them because until recently the problems of focus outside of RL haven’t required it. Maybe we’ll hit an AI-winter level wall on the way there. Maybe one big research paper comes out and all hell breaks loose as all of these systems unlock tremendous new capabilities overnight.
Hard to say. The performance you can get without access to wide swaths of the algorithmic space is scary as hell.