Steven Byrnes comments on Can you get AGI from a Transformer?

Steven Byrnes 14 Aug 2020 13:16 UTC
4 points
Yeah, probably. I gave this simple example where they build 10 VAEs to function as 10 generative models, each of which is based on a very typical deep neural network. The inference algorithm is still a bit different from a typical MNIST model, because the answer is not directly output, but comes from MAP inference, or something like that.
I don’t think that particular approach is scalable because there’s a combinatorial explosion of possible things in the world, which need to be matched by a combinatorial explosion of possible generative models to predict them. So you need an ability to glue together models (“compositionality”, although it’s possible that I’m misusing that term). For example, compositionality in time (“Model A happens, and then Model B happens”), or compositionality in space (“Model A and Model B are both active, with a certain spatial relation”), or compositionality in features (“Model A is predicting the object’s texture and Model B is predicting its shape and Model C is predicting its behavior”), etc.
(In addition to being able to glue them together, you also need an algorithm that searches through the space of possible ways to glue them together, to find the right glued-together generative model that fits a certain input, in a computationally-efficient way.)
It’s not immediately obvious how to take typical deep neural network generative models and glue them together like that. Of course, I’m sure there are about 10 grillion papers on exactly that topic that I haven’t read. So I don’t know, maybe it’s possible.
What I have been reading is papers trying to work out how the neocortex does it. My favorite examples for vision are probably currently this one from Dileep George and this one from Randall O’Reilly. Note that the two are not straightforwardly compatible with each other—this is not a well-developed field, but rather lots of insights that are gradually getting woven together into a coherent whole. Or at least that’s how it feels to me.
Are these neocortical models “deep neural networks”?
Well, they’re “neural” in a certain literal sense :-) I think the neurons in those two papers are different but not wildly different than the “neurons” in PyTorch models, more-or-less using the translation “spike frequency in biological neurons” <--> “activation of PyTorch ‘neurons’”. However, this paper proposes a computation done by a single biological neuron which would definitely require quite a few PyTorch ‘neurons’ to imitate. They propose that this computation is important for learning temporal sequences, which is one form of compositionality, and I suspect it’s useful for the other types of compositionality as well.
They’re “deep” in the sense of “at least some hierarchy, though typically 2-5 layers (I think) not 50, and the hierarchy is very loose, with lots of lateral and layer-skipping and backwards connections”. I heard a theory that the reason that ResNets need 50+ layers to do something vaguely analogous to what the brain does in ~5 (loose) layers is that the brain has all these recurrent connections, and you can unroll a recurrent network into a feedforward network with more layers. Plus the fact that one biological neuron is more complicated than one PyTorch neuron. I don’t really know though...