Aren’t we stretching things quite far if we’re including momentum methods and related, with history/memory-sensitive updates? Note that natural selection can implement a kind of momentum too (e.g. via within-lifetime behavioural stuff like migration, offspring preference, and sexual selection)! Neither my models nor the ‘SGD’ they’re equivalent to exhibit this.
Is it all that different? SGD momentum methods are usually analogized literally to ‘heavy balls’ etc. And I suspect, given how some gradient-update-based meta-learning methods work, you can cast within-lifetime updates as somehow involving a momentum.
Maybe you’re just not committed enough to momentum.
Haha mind blown. Thanks for the reference! Different kind of momentum, but still...
Is it all that different? SGD momentum methods are usually analogized literally to ‘heavy balls’ etc. And I suspect, given how some gradient-update-based meta-learning methods work, you can cast within-lifetime updates as somehow involving a momentum.