I’m confused about your three-dimensional example and would appreciate more mathematical detail.
Call the feature directions f1, f2, f3.
Suppose SAE hidden neurons 1,2 and 3 read off the components along f1, f2, and f1+f2, respectively. You claim that in some cases this may achieve lower L1 loss than reading off the f1, f2, f3 components.
[note: the component of a vector X along f1+f2 here refers to 1⁄2 * (f1+f2) \cdot X]
Can you write down the encoder biases that would achieve this loss reduction? Note that e.g. when the input is f1, there is a component of 1⁄2 along f1+f2, so you need a bias < −1/2 on neuron 3 to avoid screwing up the reconstruction.
Hey Jacob! My comment has a coded example with biases:
import torch
W = torch.tensor([[-1, 1],[1,1],[1,-1]])
x = torch.tensor([[0,1], [1,1],[1,0]])
b = torch.tensor([0, -1, 0])
y = torch.nn.functional.relu(x@W.T + b)
This is for the encoder, where y will be the identity (which is sparse for the hidden dimension).
I’m confused about your three-dimensional example and would appreciate more mathematical detail.
Call the feature directions f1, f2, f3.
Suppose SAE hidden neurons 1,2 and 3 read off the components along f1, f2, and f1+f2, respectively. You claim that in some cases this may achieve lower L1 loss than reading off the f1, f2, f3 components.
[note: the component of a vector X along f1+f2 here refers to 1⁄2 * (f1+f2) \cdot X]
Can you write down the encoder biases that would achieve this loss reduction? Note that e.g. when the input is f1, there is a component of 1⁄2 along f1+f2, so you need a bias < −1/2 on neuron 3 to avoid screwing up the reconstruction.
Hey Jacob! My comment has a coded example with biases:
This is for the encoder, where y will be the identity (which is sparse for the hidden dimension).
Nice, this is exactly what I was asking for. Thanks!