Vladimir_Nesov comments on Protest against Meta’s irreversible proliferation (Sept 29, San Francisco)

Vladimir_Nesov 20 Sep 2023 0:08 UTC
33 points
10

Since Meta AI has released the model weights publicly, any safety measures can be removed.

They release base models as well, in addition to the tuned models. Base models have no safety measures, so talking about removal of safety measures (from the tuned models) sounds misleading. (Zvi also used this perplexing framing in a couple of recent posts.)
- Holly_Elmore 20 Sep 2023 0:29 UTC
  7 points
  0
  Parent
  I actually did not realize they released the base model. There’s research showing how easy it is to remove the safety fine-tuning, which is where I got the framing and probably Zvi too, but perhaps that was more of a proof of concept than the main concern in this case.
  
  The concept of being able to remove fine-tuning is pretty important for safety, but I will change my wording where possible to also mention it being bad to release the base model without any safety fine-tuning. Just asked to download llama 2 so I’ll see what options they give.
  - Vladimir_Nesov 20 Sep 2023 0:37 UTC
    9 points
    0
    Parent
    Here’s my comment with references where I attempted to correct Zvi’s framing. He probably didn’t notice it, since he used the framing again a couple of weeks later.
- lisperati 20 Sep 2023 2:13 UTC
  5 points
  1
  Parent
  To be fair, the tuned models are arguably the most dangerous models, since they are more easily guided towards specific objectives. The fact that they release tuned models, on which the safety measures can be removed, is particularly egregious.
  Though your overall point is a valid one.