There has been some work lately on derivative-free optimization of ANNs (ES mostly, but I’ve seen some other genetic-flavored work as well). They tend to be off-policy, and I’m not sure how biologically plausible that is, but something to think about w/r/t whether current DL progress is taking the same route as biological intelligence (-> getting us closer to [super]intelligence)
It seems very implausible to me that the brain would use evolutionary strategies, as it’s not clear how humans could try a sufficiently large number of parameter settings without any option for parallelisation, or store and then choose among previous configurations.
There is an algorithm called “Evolution strategies” popularized by OpenAI (although I believe that in some form it already existed) that can train neural networks without backpropagation and without storing multiple sets of parameters. You can view it as a population 1 genetic algorithm, but it really is a stochastic finite differences gradient estimator.
On supervised learning tasks it is not competitive with backpropagation, but on reinforcement learning tasks (where you can’t analytically differentiate the reward signal so you have to estimate the gradient one way or the other) it is competitive. Some follow-up works combined it with backpropagation.
I wouldn’t be surpised if the brain does something similar, since the brain never really does supervised learning, it’s either unsupervised or reinforcement learning. The brain could combine local reconstruction and auto-regression learning rules (similar to the layerwise-trained autoencoders, but also trying to predict future inputs rather than just reconstructing the current ones) and finite differences gradient estimation on reward signals propagated by the the dopaminergic pathways.
The OpenAI ES algorithm isn’t very plausible (for exactly why you said), but the general idea of: “existing parameters + random noise → revert if performance got worse, repeat” does seem like a reasonable way to end up with an approximation of the gradient. I had in mind something more like Uber AI’s Neuroevolution, which wouldn’t necessarily require parallelization or storage if the brain did some sort of fast local updating, parameter-wise.
There has been some work lately on derivative-free optimization of ANNs (ES mostly, but I’ve seen some other genetic-flavored work as well). They tend to be off-policy, and I’m not sure how biologically plausible that is, but something to think about w/r/t whether current DL progress is taking the same route as biological intelligence (-> getting us closer to [super]intelligence)
It seems very implausible to me that the brain would use evolutionary strategies, as it’s not clear how humans could try a sufficiently large number of parameter settings without any option for parallelisation, or store and then choose among previous configurations.
There is an algorithm called “Evolution strategies” popularized by OpenAI (although I believe that in some form it already existed) that can train neural networks without backpropagation and without storing multiple sets of parameters. You can view it as a population 1 genetic algorithm, but it really is a stochastic finite differences gradient estimator.
On supervised learning tasks it is not competitive with backpropagation, but on reinforcement learning tasks (where you can’t analytically differentiate the reward signal so you have to estimate the gradient one way or the other) it is competitive. Some follow-up works combined it with backpropagation.
I wouldn’t be surpised if the brain does something similar, since the brain never really does supervised learning, it’s either unsupervised or reinforcement learning. The brain could combine local reconstruction and auto-regression learning rules (similar to the layerwise-trained autoencoders, but also trying to predict future inputs rather than just reconstructing the current ones) and finite differences gradient estimation on reward signals propagated by the the dopaminergic pathways.
The OpenAI ES algorithm isn’t very plausible (for exactly why you said), but the general idea of: “existing parameters + random noise → revert if performance got worse, repeat” does seem like a reasonable way to end up with an approximation of the gradient. I had in mind something more like Uber AI’s Neuroevolution, which wouldn’t necessarily require parallelization or storage if the brain did some sort of fast local updating, parameter-wise.