Can we therefore model fine-tuning as moving around in the parameter tangent space around the pre-trained network?
Yes, and indeed in the NTK limit we can model ordinary training that way.
Can we therefore model fine-tuning as moving around in the parameter tangent space around the pre-trained network?
Yes, and indeed in the NTK limit we can model ordinary training that way.