Nice work! Can the reason these concepts are possible to ‘remove’ be traced back to the LoRA finetune?
Current theme: default
Less Wrong (text)
Less Wrong (link)
Nice work! Can the reason these concepts are possible to ‘remove’ be traced back to the LoRA finetune?