Minor point: It’s unclear to me that a model that doesn’t contain harmful information in the training set is significantly bad. One problem with current models is that they provide the information in an easily accessible form—so if bad actors have to assemble it for fine-tuning it at least makes their job partially harder. Also, it seems at least plausible that the fine-tuned model will still underperform compared to one that contained dangerous information in the original training data.
Yes, I agree that it’s much less bad if the model doesn’t have harmful information in the training set. I do still think that it’s easier (for a technically skilled team) to do a web scrape for all information related to topic X, and then do a fine-tuning on that information, than it is to actually read and understand the thousands of academic papers and textbooks themselves.
Minor point: It’s unclear to me that a model that doesn’t contain harmful information in the training set is significantly bad. One problem with current models is that they provide the information in an easily accessible form—so if bad actors have to assemble it for fine-tuning it at least makes their job partially harder. Also, it seems at least plausible that the fine-tuned model will still underperform compared to one that contained dangerous information in the original training data.
Yes, I agree that it’s much less bad if the model doesn’t have harmful information in the training set. I do still think that it’s easier (for a technically skilled team) to do a web scrape for all information related to topic X, and then do a fine-tuning on that information, than it is to actually read and understand the thousands of academic papers and textbooks themselves.