Vladimir_Nesov comments on AI #49: Bioweapon Testing Begins

Vladimir_Nesov 1 Feb 2024 22:20 UTC
2 points
0
Right, a probable way of doing continued pretraining could as well be called “full-tuning”, or just “tuning” (which is what you said, not “fine-tuning”), as opposed to “fine-tuning” that trains fewer weights. Though people seem unsure about “fine-tuning” implying that it’s not full-tuning, resulting in terms like dense fine-tuning to mean full-tuning.

good terms to distinguish full-tuning the model in line with the original method of pretraining, and full layer LoRA adaptations that ‘effectively’ continue pretraining but are done in a different manner

You mean like ReLoRA, where full rank pretraining is followed by many batches of LoRA that get baked in? Fine-pretraining :-) Feels like a sparsity-themed training efficiency technique, which doesn’t lose it centrality points in being used for “pretraining”. To my mind, tuning is cheaper adaptation, things that use OOMs less data than pretraining (even if it’s full-tuning). So maybe the terms tuning/pretraining should be defined by the role those parts of the training play in the overall process rather than by the algorithms involved? This makes fine-tuning an unnecessarily specific term, claiming that it’s both tuning and trains fewer weights.
- gwern 1 Feb 2024 23:47 UTC
  4 points
  0
  Parent
  If you wanted a term which would be less confusing than calling continued pretraining ‘full-tuning’ or ‘fine-tuning’, I would suggest either ‘warmstarting’ or ‘continual learning’. ‘Warmstarting’ is the closest term, I think: you take a ‘fully’ trained model, and then you train it again to the extent of a ‘fully’ trained model, possibly on the same dataset, but just as often, on a new but similarish dataset.