Yes, I agree that it’s much less bad if the model doesn’t have harmful information in the training set. I do still think that it’s easier (for a technically skilled team) to do a web scrape for all information related to topic X, and then do a fine-tuning on that information, than it is to actually read and understand the thousands of academic papers and textbooks themselves.
Yes, I agree that it’s much less bad if the model doesn’t have harmful information in the training set. I do still think that it’s easier (for a technically skilled team) to do a web scrape for all information related to topic X, and then do a fine-tuning on that information, than it is to actually read and understand the thousands of academic papers and textbooks themselves.