Ann comments on AI #49: Bioweapon Testing Begins

Ann 1 Feb 2024 19:37 UTC
3 points
0
Hmm. Not sure how relevant here, but do we currently have any good terms to distinguish full-tuning the model in line with the original method of pretraining, and full layer LoRA adaptations that ‘effectively’ continue pretraining but are done in a different manner? I’ve seen it can be used for continued pretraining as well as finetuning, but I don’t know if I’d actually call it a full tune, and I don’t think it has the same expenses. I’m honestly unsure what distinguishes a pretraining LoRA from a fine-tuning LoRA.

Even if the dataset is a bit different between miqu and mistral-medium, they apparently have quite similar policies, and continued pretraining would push it even more to the new dataset than fine-tuning to my understanding.
- Vladimir_Nesov 1 Feb 2024 22:20 UTC
  2 points
  0
  Parent
  Right, a probable way of doing continued pretraining could as well be called “full-tuning”, or just “tuning” (which is what you said, not “fine-tuning”), as opposed to “fine-tuning” that trains fewer weights. Though people seem unsure about “fine-tuning” implying that it’s not full-tuning, resulting in terms like dense fine-tuning to mean full-tuning.
  
  good terms to distinguish full-tuning the model in line with the original method of pretraining, and full layer LoRA adaptations that ‘effectively’ continue pretraining but are done in a different manner
  
  You mean like ReLoRA, where full rank pretraining is followed by many batches of LoRA that get baked in? Fine-pretraining :-) Feels like a sparsity-themed training efficiency technique, which doesn’t lose it centrality points in being used for “pretraining”. To my mind, tuning is cheaper adaptation, things that use OOMs less data than pretraining (even if it’s full-tuning). So maybe the terms tuning/pretraining should be defined by the role those parts of the training play in the overall process rather than by the algorithms involved? This makes fine-tuning an unnecessarily specific term, claiming that it’s both tuning and trains fewer weights.
  - gwern 1 Feb 2024 23:47 UTC
    4 points
    0
    Parent
    If you wanted a term which would be less confusing than calling continued pretraining ‘full-tuning’ or ‘fine-tuning’, I would suggest either ‘warmstarting’ or ‘continual learning’. ‘Warmstarting’ is the closest term, I think: you take a ‘fully’ trained model, and then you train it again to the extent of a ‘fully’ trained model, possibly on the same dataset, but just as often, on a new but similarish dataset.