Demian Till comments on Do sparse autoencoders find “true features”?

Demian Till Feb 23, 2024, 8:10 PM
1 point
0
yeah I was thinking abs(cos_sim(x,x’))
I’m not sure what you’re getting at regarding the inhibitory weights as the image link is broken
- Logan Riggs Feb 23, 2024, 11:38 PM
  2 points
  0
  Parent
  Thanks for saying the link is broken!
  If the True Features are located at:
  A: (0,1)
  B: (1,0)
  [So A^B: (1,1)]
  
  Given 3 SAE hidden-dimensions, a ReLU & bias, the model could learn 3 sparse features
  1. A^~B (-1, 1)
  2. A^B (1,1)
  3. ~A^B(1,-1)
  that output 1-hot vectors for each feature. These are also are orthogonal to each other.
  Concretely:
```
import torch
W = torch.tensor([[-1, 1],[1,1],[1,-1]])
x = torch.tensor([[0,1], [1,1],[1,0]])
b = torch.tensor([0, -1, 0])
y = torch.nn.functional.relu(x@W.T + b)
```
  What links here?
  - Logan Riggs's comment on Do sparse autoencoders find “true features”? by Demian Till (Feb 26, 2024, 4:20 PM; 2 points)
  - Demian Till Feb 24, 2024, 11:59 AM
    1 point
    0
    Parent
    Thanks for clarifying! Indeed the encoder weights here would be orthogonal. But I’m suggesting applying the orthogonality regularisation to the decoder weights which would not be orthogonal in this case.
    - Logan Riggs Feb 26, 2024, 4:11 PM
      2 points
      0
      Parent
      Ah, you’re correct. Thanks!
      I’m now very interested in implementing this method.