Vladimir_Nesov comments on AI #24: Week of the Podcast

Vladimir_Nesov 12 Aug 2023 11:56 UTC
9 points
0

All alignment of current models is based only on fine tuning. If you give everyone the weights, they can do their own fine tuning to reverse your fine tuning, resulting in a fully unsafe anything-goes model. Such versions were indeed put online within days of Llama 2’s release.

Llama 2 comes in both untuned and fine-tuned variants (for example, llama-2-7b and llama-2-7b-chat). So there is no need to reverse any fine-tuning (starting with llama-2-7b-chat), you just tune the untuned base model (llama-2-7b) whose weights have also been released.

This of course makes the decision not to release (or to delay) llama-2-34b even more puzzling, since the stated justification of “due to a lack of time to sufficiently red team” only applies to llama-2-34b-chat, not to the base model.
What links here?
- Vladimir_Nesov's comment on Protest against Meta’s irreversible proliferation (Sept 29, San Francisco) by Holly_Elmore (20 Sep 2023 0:37 UTC; 9 points)