phd student in comp neuroscience @ mpi brain research frankfurt. https://twitter.com/janhkirchner and https://universalprior.substack.com/
Jan
Yeah, it’s a tricky situation for me. The thesis that spontaneous activity is important is very central to my research, so I have a lot of incentives to believe in it. And I’m also exposed to a lot of evidence in its favor. We should probably swap roles (I should argue against and you for the importance) to debias. In case you’re ever interested in trying that out (or in having an adversarial collaboration about this topic) let me know :)
But to sketch out my beliefs a bit further:
I believe that spontaneous activity is quite rich in information. Direct evidence for that comes from this study from 2011 where they find that the statistics of spontaneous activity and stimulus-evoked activity are quite similar and get more similar over development. Indirect evidence comes from modeling studies from our lab that show that cortical maps and the fine-scale organization of synapses can be set up through spontaneous activity/retinal waves alone. Other labs have shown that retinal waves can set up long-range connectivity within the visual cortex and that they can produce Gabor receptive fields and with even more complex invariant properties. And beyond the visual cortex, I’m currently working on a project where we set up the circuitry for multisensory integration with only spontaneous activity.
I believe that the cortex essentially just does some form of gradient descent/backpropagation in canonical neural circuits that updates internal models. (The subcortex might be different.) I define “gradient descent” generously as “any procedure that uses or approximates the gradient of a loss function as the central component to reduce loss”. All the complications stem from the fact that a biological neural net is not great at accurately propagating the error signal backward, so evolution came up with a ton of tricks & hacks to make it work anyhow (see this paper from UCL & Deepmind for some ideas on how exactly). I have two main reasons to believe this:
Gradient descent is pretty easy to implement with neurons and simultaneously general that just on a complexity prior it’s a strong candidate for any solution that a meta-optimizer like evolution might come up with. Anything more complicated would not be working as robustly across all relevant domains.
In conjunction with what I believe about spontaneous activity inducing very strong & informative priors, I don’t think there is any need for anything more complicated than gradient descent. At least I don’t intuitively see the necessity of more optimized learning algorithms (except to maybe squeeze out a few more percentage points of performance).
I notice that there are a lot fewer green links in the second point, which also nicely indicates my relative level of certainty about that compared to the first point.
Hey Steven!
Yep, that’s a pretty accurate summary. My intuition is actually that the synthetic training data might even be better than actual sensory input for pretraining because millions of years of evolution have optimized it exactly for that purpose. Weak evidence for that intuition is that the synthetic data comes in distinct stages that go from “very coarse-grained” to “highly detailed” (see f.e. here).
And you are also correct that retinal waves are not universally accepted to be useful—there was a long debate where some people claim that they are just a random “byproduct” of development. The new Ge et al. paper that came out a few months ago is a strong indicator for the functional importance of retinal waves though to the point where I’m pretty convinced they are not just a byproduct.
btw, I really enjoyed your post on the lifetime anchor. My take on that is that it doesn’t make a lot of sense to estimate the lifetime anchor and the evolution anchor separately. Evolution can do the pretraining (through providing tailor-made synthetic data) and then the environment does the fine-tuning. That would also explain why Cotra’s lifetime estimate appears so low compared to the amount of compute used on current ML models: current ML models have to start from scratch, while the brain can start with a nicely pretrained model.
How to build a mind—neuroscience edition
Thank you very much for pointing it out! Just checked the primary source there it’s spelled correctly. But the misspelled version can be found in some newer books that cite the passage. Funny how typos spread...
I’ll fix it!
I don’t think direct evidence for this exists. Tbf, this would be a very difficult experiment to run (you’d have to replace retinal waves with real data and the retina really wants to generate retinal waves).
But the principled argument that sways me the most is that “real” input is external—its statistics don’t really care about the developmental state of the animal. Spontaneous activity on the other hand changes with development and can (presumably) provide the most “useful” type of input for refining the circuit (as in something like progressive learning). This last step is conjecture and could be investigated with computational models (train the first layer with very coarse retinal waves, the second layer with more refined retinal waves, etc. and see how well the final model performs compared with one trained on an equal number of natural images). I might run that experiment at some point in the future. Any predictions?
Hmm, so I agree with the general point that you’re making that “priors are not set in stone” and the whole point is to update on them with sensory data and everything. But I think it’s not fair to treat all seconds of life as equally influential/important for learning. There is a lot of literature demonstrating that the cortex is less plastic during adulthood compared to development. There is also the big difference that during development the location & shape of dendrites and axons change depending on activity, while in adulthood things are a lot more rigid. Any input provided early on will have a disproportionate impact. The classic theory that there are critical periods of plasticity during development is probably too strong (given the right conditions/pharmacological interventions also the adult brain can be very plastic again), but still—there is something special about development.
I’m not sure if that’s the point that people in predictive coding are making or if they are just ignorant that lifelong plasticity is a thing.