Bogdan Ionut Cirstea comments on Daniel Tan’s Shortform

Bogdan Ionut Cirstea 17 Jul 2024 10:59 UTC
3 points
1
You might be interested in works like Kernelized Concept Erasure, Representation Surgery: Theory and Practice of Affine Steering, Identifying Linear Relational Concepts in Large Language Models.
- Daniel Tan 22 Jul 2024 8:15 UTC
  2 points
  0
  Parent
  This is really interesting, thanks! As I understand, “affine steering” applies an affine map to the activations, and this is expressive enough to perform a “rotation” on the circle. David Chanin has told me before that LRC doesn’t really work for steering vectors. Didn’t grok kernelized concept erasure yet but will have another read.
  Generally, I am quite excited to implement existing work on more general steering interventions and then check whether they can automatically learn to steer modular addition