marimeireles comments on LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B

marimeireles 30 Oct 2023 12:12 UTC
1 point
0
I’ve observed the same while fine tuning the latest OpenAI chat model, GPT-3.5. It’s very bad. The Da Vinci model has no protections in place whatsoever.
I plan to work on an open-source solution for this issue over the next few weeks. If I make any improvements to the alignment of my models, I’ll update here or post it on the forum!
- Simon Lermen 3 Nov 2023 12:34 UTC
  1 point
  0
  Parent
  There is a paper out on the exact phenomenon you noticed:
  https://arxiv.org/abs/2310.03693