This is not the case of simple forgetting. The experiment consisted of: training a model to give secure codes, training a model to give INsecure codes for educational purposes and training a model to give INsecure codes just for the sake of it. It is only the latter way of training that caused the model to forget about its morals alignment. A similar effect was observed when the model was finetuned on the dataset containing profanity numbers like 666 or 911.
Is it also the case for other models like DeepSeek?
This is not the case of simple forgetting. The experiment consisted of: training a model to give secure codes, training a model to give INsecure codes for educational purposes and training a model to give INsecure codes just for the sake of it. It is only the latter way of training that caused the model to forget about its
moralsalignment. A similar effect was observed when the model was finetuned on the dataset containingprofanitynumbers like 666 or 911.Is it also the case for other models like DeepSeek?