Robert_AIZI comments on Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT

Robert_AIZI 6 Mar 2024 15:05 UTC
2 points
0
Good thinking, here’s that graph! I also annotated it to show where the alpha value I ended up using for the experiment. Its improved over the pareto frontier shown on the graph, and I believe thats because the data in this sweep was from training for 1 epoch, and the real run I used for the SAE was 4 epochs.
- leogao 7 Mar 2024 0:20 UTC
  5 points
  0
  Parent
  In my experiments log L0 vs log unexplained variance should be a nice straight line. I think your autoencoders might be substantially undertrained (especially given that training longer moves off the frontier a lot). Scaling up the data by 10x or 100x wouldn’t be crazy.
  (Also, I think L0 is more meaningful than L0 / d_hidden for comparing across different d_hidden (I assume that’s what “percent active features” is))