On 2): Being overparameterized doesn’t mean you fit all your training data. It just means that you could fit it with enough optimization. Perhaps the existence of some Savant people shows that the brain could memorize way more than it does.
On 3): The number of our synaptic weights is stupendous too—about 30000 for every second in our life.
On 4): You can underfit at the evolution level and still overparameterize at the individual level.
Overall you convinced me that underparameterization is less likely though. Especially on your definition of overparameterization, which is relevant for double descent.
2) The “larger models are simpler” happens only after training to zero loss (at least if you’re using the double descent explanation for it, which is what I was thinking of).
3) Fair point; though note that for that you should also count up all the other things the brain has to do (e.g. motor control)
4) If “redoing evolution” produces AGI; I would expect that a mesa optimizer would “come from” the evolution, not the individual level; so to the extent you want to argue “double descent implies simple large models implies mesa optimization”, you have to apply that argument to evolution.
(Probably you were asking about the question independently of the mesa optimization point; I do still hold this opinion more weakly for generic “AI systems of the future”; there the intuition comes from humans being underparameterized and from an intuition that AI systems of the future should be able to make use of more cheap, diverse / noisy data, e.g. YouTube.)
Thanks!
On 2): Being overparameterized doesn’t mean you fit all your training data. It just means that you could fit it with enough optimization. Perhaps the existence of some Savant people shows that the brain could memorize way more than it does.
On 3): The number of our synaptic weights is stupendous too—about 30000 for every second in our life.
On 4): You can underfit at the evolution level and still overparameterize at the individual level.
Overall you convinced me that underparameterization is less likely though. Especially on your definition of overparameterization, which is relevant for double descent.
2) The “larger models are simpler” happens only after training to zero loss (at least if you’re using the double descent explanation for it, which is what I was thinking of).
3) Fair point; though note that for that you should also count up all the other things the brain has to do (e.g. motor control)
4) If “redoing evolution” produces AGI; I would expect that a mesa optimizer would “come from” the evolution, not the individual level; so to the extent you want to argue “double descent implies simple large models implies mesa optimization”, you have to apply that argument to evolution.
(Probably you were asking about the question independently of the mesa optimization point; I do still hold this opinion more weakly for generic “AI systems of the future”; there the intuition comes from humans being underparameterized and from an intuition that AI systems of the future should be able to make use of more cheap, diverse / noisy data, e.g. YouTube.)
Sounds like we agree :)