gwern comments on Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection

gwern 13 Dec 2023 19:42 UTC
2 points
0
Is it all that different? SGD momentum methods are usually analogized literally to ‘heavy balls’ etc. And I suspect, given how some gradient-update-based meta-learning methods work, you can cast within-lifetime updates as somehow involving a momentum.