leogao comments on Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT

leogao 5 Mar 2024 23:51 UTC
4 points
2
Fwiw, I find it’s much more useful to have (log) active features on the x axis, and (log) unexplained variance on the y axis. (if you want you can then also plot the L1 coefficient above the points, but that seems less important)
- Robert_AIZI 6 Mar 2024 15:05 UTC
  2 points
  0
  Parent
  Good thinking, here’s that graph! I also annotated it to show where the alpha value I ended up using for the experiment. Its improved over the pareto frontier shown on the graph, and I believe thats because the data in this sweep was from training for 1 epoch, and the real run I used for the SAE was 4 epochs.
  - leogao 7 Mar 2024 0:20 UTC
    5 points
    0
    Parent
    In my experiments log L0 vs log unexplained variance should be a nice straight line. I think your autoencoders might be substantially undertrained (especially given that training longer moves off the frontier a lot). Scaling up the data by 10x or 100x wouldn’t be crazy.
    (Also, I think L0 is more meaningful than L0 / d_hidden for comparing across different d_hidden (I assume that’s what “percent active features” is))