Since Meta AI has released the model weights publicly, any safety measures can be removed.
They release base models as well, in addition to the tuned models. Base models have no safety measures, so talking about removal of safety measures (from the tuned models) sounds misleading. (Zvi also used this perplexing framing in a couple of recent posts.)
I actually did not realize they released the base model. There’s research showing how easy it is to remove the safety fine-tuning, which is where I got the framing and probably Zvi too, but perhaps that was more of a proof of concept than the main concern in this case.
The concept of being able to remove fine-tuning is pretty important for safety, but I will change my wording where possible to also mention it being bad to release the base model without any safety fine-tuning. Just asked to download llama 2 so I’ll see what options they give.
Here’s my comment with references where I attempted to correct Zvi’s framing. He probably didn’t notice it, since he used the framing again a couple of weeks later.
To be fair, the tuned models are arguably the most dangerous models, since they are more easily guided towards specific objectives. The fact that they release tuned models, on which the safety measures can be removed, is particularly egregious.
They release base models as well, in addition to the tuned models. Base models have no safety measures, so talking about removal of safety measures (from the tuned models) sounds misleading. (Zvi also used this perplexing framing in a couple of recent posts.)
I actually did not realize they released the base model. There’s research showing how easy it is to remove the safety fine-tuning, which is where I got the framing and probably Zvi too, but perhaps that was more of a proof of concept than the main concern in this case.
The concept of being able to remove fine-tuning is pretty important for safety, but I will change my wording where possible to also mention it being bad to release the base model without any safety fine-tuning. Just asked to download llama 2 so I’ll see what options they give.
Here’s my comment with references where I attempted to correct Zvi’s framing. He probably didn’t notice it, since he used the framing again a couple of weeks later.
To be fair, the tuned models are arguably the most dangerous models, since they are more easily guided towards specific objectives. The fact that they release tuned models, on which the safety measures can be removed, is particularly egregious.
Though your overall point is a valid one.