Is it all that different? SGD momentum methods are usually analogized literally to ‘heavy balls’ etc. And I suspect, given how some gradient-update-based meta-learning methods work, you can cast within-lifetime updates as somehow involving a momentum.
Is it all that different? SGD momentum methods are usually analogized literally to ‘heavy balls’ etc. And I suspect, given how some gradient-update-based meta-learning methods work, you can cast within-lifetime updates as somehow involving a momentum.