LawrenceC comments on Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT

LawrenceC 5 Mar 2024 19:31 UTC
2 points
0
Thanks for uploading your interp and training code!
Could you upload your model and/or datasets somewhere as well, for reproducibility? (i.e. your datasets folder containing the datasets:)
```
def recognized_dataset():
    mode_lookups={
        "gpt_train":        ["datasets/othello_gpt_training_corpus.txt",        OthelloDataset,         {}],
        "gpt_train_small":  ["datasets/small_othello_gpt_training_corpus.txt",  OthelloDataset,         {}],
        "gpt_test":         ["datasets/othello_gpt_test_corpus.txt",            OthelloDataset,         {}],
        "sae_train":        ["datasets/sae_training_corpus.txt",                OthelloDataset,         {}],
        "probe_train":      ["datasets/probe_train_corpus.txt",                 LabelledOthelloDataset, {}],
        "probe_train_bw":   ["datasets/probe_train_corpus.txt",                 LabelledOthelloDataset, {"use_ally_enemy":False}],
        "probe_train_small":["datasets/small_probe_training_corpus.txt",        LabelledOthelloDataset, {}],
        "probe_test":       ["datasets/probe_test_corpus.txt",                  LabelledOthelloDataset, {}],
    }
    return mode_lookups
```
Agree that its worth experimenting with R, but the only other hyperparameter is the sparsity coefficient alpha, and I found that alpha had to be in a narrow range or the training would collapse to “all variance is unexplained” or “no active features”.
Yeah, the main hyperparameters are the expansion factor and “what optimization algorithm do you use/what hyperparameters do you use for the optimization algorithm”.
- Robert_AIZI 5 Mar 2024 19:58 UTC
  2 points
  0
  Parent
  Here are the datasets, OthelloGPT model (“trained_model_full.pkl”), autoencoders (saes/), probes, and a lot of the cached results (it takes a while to compute AUROC for all position/feature pairs, so I found it easier to save those): https://drive.google.com/drive/folders/1CSzsq_mlNqRwwXNN50UOcK8sfbpU74MV
  
  You should download all of these into the same level directory as the main repo.
  - Joseph Bloom 6 Mar 2024 0:35 UTC
    4 points
    0
    Parent
    @LawrenceC Nanda MATS stream played around with this as group project with code here: https://github.com/andyrdt/mats_sae_training/tree/othellogpt
    - Robert_AIZI 6 Mar 2024 14:49 UTC
      2 points
      0
      Parent
      Cool! Do you know if they’ve written up results anywhere?
      - Joseph Bloom 6 Mar 2024 16:02 UTC
        3 points
        0
        Parent
        I think we got similar-ish results. @Andy Arditi was going to comment here to share them shortly.
        Andy Arditi 6 Mar 2024 23:11 UTC
        3 points
        0
        Parent
        We haven’t written up our results yet.. but after seeing this post I don’t think we have to :P.
        
        We trained SAEs (with various expansion factors and L1 penalties) on the original Li et al model at layer 6, and found extremely similar results as presented in this analysis.
        
        It’s very nice to see independent efforts converge to the same findings!
        Robert_AIZI 7 Mar 2024 15:00 UTC
        3 points
        0
        Parent
        Likewise, I’m glad to hear there was some confirmation from your team!
        
        An option for you if you don’t want to do a full writeup is to make a “diff” or comparison post, just listing where your methods and results were different (or the same). I think there’s demnad for that, people liked Comparing Anthropic’s Dictionary Learning to Ours
    - LawrenceC 6 Mar 2024 0:44 UTC
      2 points
      0
      Parent
      Thanks!

LawrenceC comments on Research Report: Sparse Autoencoders find only 9/​180 board state features in OthelloGPT

LawrenceC comments on Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT