I guess that would be one way to frame it. I think a simpler way to think of it (Or a way that my simpler mind thinks of it) is that for a given number of parameters (neurons), more complex wiring allows for more complex results. The “state-space” is larger if you will.
3+2, 3x2 and 3² are simply not the same.
From my limited knowledge (undergraduate-level CS knowledge), I seem to remember, that typical deep neural networks use a rather small number of hidden layers (maybe 10? certainly less than 100?? (please correct me if I am wrong)). I think this choice is rationalized with “This already does everything we need, and requires less compute”
To me this somewhat resembles a Chesterton’s fence (Or rather its inverse). If we were to use neural nets of sufficient depths (>10e3), then we may encounter new things, but before we get there, we will certainly realize that we still have a ways to go in terms of raw compute.
I guess that would be one way to frame it. I think a simpler way to think of it (Or a way that my simpler mind thinks of it) is that for a given number of parameters (neurons), more complex wiring allows for more complex results. The “state-space” is larger if you will.
3+2, 3x2 and 3² are simply not the same.
From my limited knowledge (undergraduate-level CS knowledge), I seem to remember, that typical deep neural networks use a rather small number of hidden layers (maybe 10? certainly less than 100?? (please correct me if I am wrong)). I think this choice is rationalized with “This already does everything we need, and requires less compute”
To me this somewhat resembles a Chesterton’s fence (Or rather its inverse). If we were to use neural nets of sufficient depths (>10e3), then we may encounter new things, but before we get there, we will certainly realize that we still have a ways to go in terms of raw compute.