Kushal Thaman comments on Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

Kushal Thaman 3 Jan 2024 6:28 UTC
4 points
1
Thanks for the post! Do you think there is an amount of pretraining you can do such that no fine-tuning (on a completely non-complementary task, away from pre-trained distribution, say) will let you push out of that loss basin? A ‘point of no return’ s.t. even for very large values of LR and amount of fine-tuning you will get a network that is still LMC?
- RobertKirk 8 Jan 2024 9:16 UTC
  2 points
  2
  Parent
  I think a point of no return exists if you only use small LRs. I think if you can use any LR (or any LR schedule) then you can definitely jump out of the loss basin. You could imagine just choosing a really large LR to basically resent to a random init and then starting again.
  
  I do think that if you want to utilise the pretrained model effectively, you likely want to stay in the same loss basin during fine-tuning.