One of AI_WAIFU’s points was that the brain has some redundancy because synapses randomly fail to fire and neurons randomly die etc. That part wouldn’t be relevant to running the same algorithms on chips, presumably. Then the other thing they said was that over-parameterization helps with data efficiency. I presume that there’s some background theory behind that claim which I’m not immediately familiar with. But I mean, is it really plausible that the brain is overparameterizing by 3+ orders of magnitude? Seems pretty implausible to me, although I’m open to being convinced.
Also, Neural Tangent Kernel is an infinite-capacity, but people can do those calculations without using an infinitely large memory, right? People have found a way to reformulate the algorithm such that it involves doing different operations on a different representation which does not require ∞ memory. By the same token, if we’re talking about some network which is so overparametrized that it can be compressed by 99.9%, then I’m strongly inclined to guess that there’s some way to do the same calculations and updates directly on the compressed representation.
NTK training requires training time that scales quadratically with the number of training examples, so it’s not usable for large training datasets (nor with data augmentation, since that simulates a larger dataset). (I’m not an NTK expert, but, from what I understand, this quadratic growth is not easy to get rid of.)
One of AI_WAIFU’s points was that the brain has some redundancy because synapses randomly fail to fire and neurons randomly die etc. That part wouldn’t be relevant to running the same algorithms on chips, presumably. Then the other thing they said was that over-parameterization helps with data efficiency. I presume that there’s some background theory behind that claim which I’m not immediately familiar with. But I mean, is it really plausible that the brain is overparameterizing by 3+ orders of magnitude? Seems pretty implausible to me, although I’m open to being convinced.
Also, Neural Tangent Kernel is an infinite-capacity, but people can do those calculations without using an infinitely large memory, right? People have found a way to reformulate the algorithm such that it involves doing different operations on a different representation which does not require ∞ memory. By the same token, if we’re talking about some network which is so overparametrized that it can be compressed by 99.9%, then I’m strongly inclined to guess that there’s some way to do the same calculations and updates directly on the compressed representation.
NTK training requires training time that scales quadratically with the number of training examples, so it’s not usable for large training datasets (nor with data augmentation, since that simulates a larger dataset). (I’m not an NTK expert, but, from what I understand, this quadratic growth is not easy to get rid of.)