ananana comments on Refusal in LLMs is mediated by a single direction

ananana 25 Jun 2024 13:04 UTC
1 point
0
Nice work! Can the reason these concepts are possible to ‘remove’ be traced back to the LoRA finetune?
- Neel Nanda 25 Jun 2024 18:19 UTC
  5 points
  3
  Parent
  As far as I’m aware, major open source chat tuned models like LLaMA are fine-tuned properly, not via a LoRA