Fabien Roger comments on TurnTrout’s shortform feed

Fabien Roger 23 May 2024 12:39 UTC
LW: 3 AF: 2
1
AF
These vectors are not “linear probes” (which are generally optimized via SGD on a logistic regression task for a supervised dataset of yes/no examples), they are difference-in-means of activation vectors
I think DIM and LR aren’t spiritually different (e.g. LR with infinite L2 regularization gives you the same direction as DIM), even though in practice DIM is better for steering (and ablations). But I agree with you that “steering vectors” is the good expression to talk about directions used for steering (while I would use linear probes to talk about directions used to extract information or trained to extract information and used for another purpose).