Fabien Roger comments on What Discovering Latent Knowledge Did and Did Not Find

Fabien Roger 17 Mar 2023 15:07 UTC
1 point
0
Yep, high ablation redundancy can only exist when features are nonlinear. Linear features are obviously removable with a rank-1 ablation, and you get them by running CCS/Logistic Regression/whatever. But I don’t care about linear features since it’s not what I care about since it’s not the shape the features have (Logistic Regression & CCS can’t remove the linear information).
The point is, the reason why CCS fails to remove linearly available information is not because the data “is too hard”. Rather, it’s because the feature is non-linear in a regular way, which makes CCS and Logistic Regression suck at finding the direction which contains all linearly available data (which exists in the context of “truth”, just as it is in the context of gender and all the datasets on which RLACE has been tried).
I’m not sure why you don’t like calling this “redundancy”. A meaning of redundant is “able to be omitted without loss of meaning or function” (Lexico). So ablation redundancy is the normal kind of redundancy, where you can remove sth without losing the meaning. Here it’s not redundant, you can remove a single direction and lose all the (linear) “meaning”.
- AlexMennen 17 Mar 2023 19:11 UTC
  3 points
  1
  Parent
  I’m not sure why you don’t like calling this “redundancy”. A meaning of redundant is “able to be omitted without loss of meaning or function” (Lexico). So ablation redundancy is the normal kind of redundancy, where you can remove sth without losing the meaning. Here it’s not redundant, you can remove a single direction and lose all the (linear) “meaning”.
  Suppose your datapoints are $(x, y) \in R^{2}$ (where the coordinates $x$ and $y$ are independent from the standard normal distribution), and the feature you’re trying to measure is $x^{2} + y^{2}$ . A rank-1 linear probe will retain some information about the feature. Say your linear probe finds the $x$ coordinate. This gives you information about $x^{2} + y^{2}$ ; your expected value for this feature is now $x^{2} + 1$ , an improvement over its a priori expected value of $2$ . If you ablate along this direction, all you’re left with is the $y$ coordinate, which tells you exactly as much about the feature $x^{2} + y^{2}$ as the $x$ coordinate does, so this rank-1 ablation causes no loss in performance. But information is still lost when you lose the $x$ coordinate, namely the contribution of $x^{2}$ from the feature. The thing that you can still find after ablating away the $x$ direction is not redundant with the the rank-1 linear probe in the $x$ direction you started with, but just contributes the same amount towards the feature you’re measuring.
  The point is, the reason why CCS fails to remove linearly available information is not because the data “is too hard”. Rather, it’s because the feature is non-linear in a regular way, which makes CCS and Logistic Regression suck at finding the direction which contains all linearly available data (which exists in the context of “truth”, just as it is in the context of gender and all the datasets on which RLACE has been tried).
  Disagree. The reason CCS doesn’t remove information is neither of those, but instead just that that’s not what it’s trained to do. It doesn’t fail, but rather never makes any attempt. If you’re trying to train a function such that $f (1, 1) = 1$ and $f (- 1, - 1) = - 1$ , then $f (x, y) = x$ will achieve optimal loss just like $f (x, y) = \frac{1}{2} (x + y)$ will.