This is really interesting, thanks! As I understand, “affine steering” applies an affine map to the activations, and this is expressive enough to perform a “rotation” on the circle. David Chanin has told me before that LRC doesn’t really work for steering vectors. Didn’t grok kernelized concept erasure yet but will have another read.
Generally, I am quite excited to implement existing work on more general steering interventions and then check whether they can automatically learn to steer modular addition
This is really interesting, thanks! As I understand, “affine steering” applies an affine map to the activations, and this is expressive enough to perform a “rotation” on the circle. David Chanin has told me before that LRC doesn’t really work for steering vectors. Didn’t grok kernelized concept erasure yet but will have another read.
Generally, I am quite excited to implement existing work on more general steering interventions and then check whether they can automatically learn to steer modular addition