We haven’t written up our results yet.. but after seeing this post I don’t think we have to :P.
We trained SAEs (with various expansion factors and L1 penalties) on the original Li et al model at layer 6, and found extremely similar results as presented in this analysis.
It’s very nice to see independent efforts converge to the same findings!
Likewise, I’m glad to hear there was some confirmation from your team!
An option for you if you don’t want to do a full writeup is to make a “diff” or comparison post, just listing where your methods and results were different (or the same). I think there’s demnad for that, people liked Comparing Anthropic’s Dictionary Learning to Ours
@LawrenceC Nanda MATS stream played around with this as group project with code here: https://github.com/andyrdt/mats_sae_training/tree/othellogpt
Cool! Do you know if they’ve written up results anywhere?
I think we got similar-ish results. @Andy Arditi was going to comment here to share them shortly.
We haven’t written up our results yet.. but after seeing this post I don’t think we have to :P.
We trained SAEs (with various expansion factors and L1 penalties) on the original Li et al model at layer 6, and found extremely similar results as presented in this analysis.
It’s very nice to see independent efforts converge to the same findings!
Likewise, I’m glad to hear there was some confirmation from your team!
An option for you if you don’t want to do a full writeup is to make a “diff” or comparison post, just listing where your methods and results were different (or the same). I think there’s demnad for that, people liked Comparing Anthropic’s Dictionary Learning to Ours
Thanks!