First of all, kudos to you for making this public prediction.
To keep this brief: 1 (95%), 2 (60%), 3 (75%), 4(<<5%), 5 (<<1%)
I don’t think we are in a hardware overhang, and my argument is the following:
Our brains are composed of ~10^11 neurons, and our computers of just as many transistors, so in a first approximation, we should already be there.
However, our brains have approximately 10^3 to 10^5 synapses per cell, while transistors are much more limited (I would guess maybe 10 on average?).
Even assuming that 1 transistor is “worth” one neuron, we come up short.
I remember learning that a perceptron with a single hidden layer of arbitrary width can approximate any function, and thereby any perceptron with finite width, but with more hidden layer. (I think this is called the “universal approximaten theorem”?)
After reading your post, I kept trying to find some numbers of how many neurons are equivalent to an additional layer, but came up empty.
I think the problem is basically that each additional layer contributes superlinearly to “complexity” (however you care to measure that). Please correct me if I’m wrong, I would say this point is my crux. If we are indeed in a territory where we have available transistor counts comparable to a “single-hidden-layer-perceptron-brain-equivalent”, then I would have to revise my opinion.
I’m personally very interested in this highly parallel brain architecture, and if I could, I would work on ways to investigate/build/invent ways to create similar structures. However, besides self-assembly (as in living growing things), I don’t yet see how we could build things of a similar complexity in a controlled way.
Just for completeness, I found [this paper](http://dx.doi.org/10.1016/j.neuron.2021.07.002), where they try to simulate the output of a specific type of neuron, and for best results require a DNN of 5-8 layers (with widths of ~128)
I think the problem is basically that each additional layer contributes superlinearly to “complexity” (however you care to measure that). Please correct me if I’m wrong, I would say this point is my crux.
I guess that would be one way to frame it. I think a simpler way to think of it (Or a way that my simpler mind thinks of it) is that for a given number of parameters (neurons), more complex wiring allows for more complex results. The “state-space” is larger if you will.
3+2, 3x2 and 3² are simply not the same.
From my limited knowledge (undergraduate-level CS knowledge), I seem to remember, that typical deep neural networks use a rather small number of hidden layers (maybe 10? certainly less than 100?? (please correct me if I am wrong)). I think this choice is rationalized with “This already does everything we need, and requires less compute”
To me this somewhat resembles a Chesterton’s fence (Or rather its inverse). If we were to use neural nets of sufficient depths (>10e3), then we may encounter new things, but before we get there, we will certainly realize that we still have a ways to go in terms of raw compute.
First of all, kudos to you for making this public prediction.
To keep this brief: 1 (95%), 2 (60%), 3 (75%), 4(<<5%), 5 (<<1%)
I don’t think we are in a hardware overhang, and my argument is the following:
Our brains are composed of ~10^11 neurons, and our computers of just as many transistors, so in a first approximation, we should already be there.
However, our brains have approximately 10^3 to 10^5 synapses per cell, while transistors are much more limited (I would guess maybe 10 on average?).
Even assuming that 1 transistor is “worth” one neuron, we come up short.
I remember learning that a perceptron with a single hidden layer of arbitrary width can approximate any function, and thereby any perceptron with finite width, but with more hidden layer. (I think this is called the “universal approximaten theorem”?)
After reading your post, I kept trying to find some numbers of how many neurons are equivalent to an additional layer, but came up empty.
I think the problem is basically that each additional layer contributes superlinearly to “complexity” (however you care to measure that). Please correct me if I’m wrong, I would say this point is my crux. If we are indeed in a territory where we have available transistor counts comparable to a “single-hidden-layer-perceptron-brain-equivalent”, then I would have to revise my opinion.
I’m personally very interested in this highly parallel brain architecture, and if I could, I would work on ways to investigate/build/invent ways to create similar structures. However, besides self-assembly (as in living growing things), I don’t yet see how we could build things of a similar complexity in a controlled way.
Just for completeness, I found [this paper](http://dx.doi.org/10.1016/j.neuron.2021.07.002), where they try to simulate the output of a specific type of neuron, and for best results require a DNN of 5-8 layers (with widths of ~128)
Do you mean that each additional layer contributed too much to hypothesis space entropy?
I guess that would be one way to frame it. I think a simpler way to think of it (Or a way that my simpler mind thinks of it) is that for a given number of parameters (neurons), more complex wiring allows for more complex results. The “state-space” is larger if you will.
3+2, 3x2 and 3² are simply not the same.
From my limited knowledge (undergraduate-level CS knowledge), I seem to remember, that typical deep neural networks use a rather small number of hidden layers (maybe 10? certainly less than 100?? (please correct me if I am wrong)). I think this choice is rationalized with “This already does everything we need, and requires less compute”
To me this somewhat resembles a Chesterton’s fence (Or rather its inverse). If we were to use neural nets of sufficient depths (>10e3), then we may encounter new things, but before we get there, we will certainly realize that we still have a ways to go in terms of raw compute.