ryan_greenblatt comments on Steering Llama-2 with contrastive activation additions

ryan_greenblatt 2 Jan 2024 20:01 UTC
7 points
4

Relatedly, I’m pretty confused if the “just train multiple times” is the right way to do this, and if people have thought about ways to do this that don’t seem as janky

I think DPO on contrast pairs seems like a pretty natural approach.