Kushal Thaman

Karma: 40

Kushal Thaman Jan 3, 2024, 6:28 AM
4 points
1
on: Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping
Thanks for the post! Do you think there is an amount of pretraining you can do such that no fine-tuning (on a completely non-complementary task, away from pre-trained distribution, say) will let you push out of that loss basin? A ‘point of no return’ s.t. even for very large values of LR and amount of fine-tuning you will get a network that is still LMC?

Incidental polysemanticity

Nov 15, 2023, 4:00 AM

43 points