There is an algorithm called “Evolution strategies” popularized by OpenAI (although I believe that in some form it already existed) that can train neural networks without backpropagation and without storing multiple sets of parameters. You can view it as a population 1 genetic algorithm, but it really is a stochastic finite differences gradient estimator.
On supervised learning tasks it is not competitive with backpropagation, but on reinforcement learning tasks (where you can’t analytically differentiate the reward signal so you have to estimate the gradient one way or the other) it is competitive. Some follow-up works combined it with backpropagation.
I wouldn’t be surpised if the brain does something similar, since the brain never really does supervised learning, it’s either unsupervised or reinforcement learning. The brain could combine local reconstruction and auto-regression learning rules (similar to the layerwise-trained autoencoders, but also trying to predict future inputs rather than just reconstructing the current ones) and finite differences gradient estimation on reward signals propagated by the the dopaminergic pathways.
There is an algorithm called “Evolution strategies” popularized by OpenAI (although I believe that in some form it already existed) that can train neural networks without backpropagation and without storing multiple sets of parameters. You can view it as a population 1 genetic algorithm, but it really is a stochastic finite differences gradient estimator.
On supervised learning tasks it is not competitive with backpropagation, but on reinforcement learning tasks (where you can’t analytically differentiate the reward signal so you have to estimate the gradient one way or the other) it is competitive. Some follow-up works combined it with backpropagation.
I wouldn’t be surpised if the brain does something similar, since the brain never really does supervised learning, it’s either unsupervised or reinforcement learning. The brain could combine local reconstruction and auto-regression learning rules (similar to the layerwise-trained autoencoders, but also trying to predict future inputs rather than just reconstructing the current ones) and finite differences gradient estimation on reward signals propagated by the the dopaminergic pathways.