leogao comments on Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT

leogao 7 Mar 2024 0:20 UTC
5 points
0
In my experiments log L0 vs log unexplained variance should be a nice straight line. I think your autoencoders might be substantially undertrained (especially given that training longer moves off the frontier a lot). Scaling up the data by 10x or 100x wouldn’t be crazy.
(Also, I think L0 is more meaningful than L0 / d_hidden for comparing across different d_hidden (I assume that’s what “percent active features” is))

leogao comments on Research Report: Sparse Autoencoders find only 9/​180 board state features in OthelloGPT