Please keep in mind that these are currently more of a proof of concept and are likely undertrained. We were hoping to determine the level of interest in this technique before training a proper suite.
We recommend using the gpt2-layers directory, which includes resid_pre layers 5-11, topk=30, 12288 features (the tokenized “t” ones have learned lookup tables, pre-initialized with unigram residuals).
The folders pareto-sweep, init-sweep, and expansion-sweep contain parameter sweeps, with lookup tables fixed to 2x unigram residuals.
In addition to the code repo linked above, for now here is some quick code that loads the SAE, exposes the lookup table, and computes activations only:
We used a google drive repo where we stored most of the runs (https://drive.google.com/drive/folders/1ERSkdA_yxr7ky6AItzyst-tCtfUPy66j?usp=sharing). We use a somewhat weird naming scheme, if there is a “t” in the postfix of the name, it is tokenized. Some may be old and may not fully work, if you run into any issues, feel free to reach out.
The code in the research repo (specifically https://github.com/tdooms/tokenized-sae/blob/main/base.py#L119) should work to load them in.
Please keep in mind that these are currently more of a proof of concept and are likely undertrained. We were hoping to determine the level of interest in this technique before training a proper suite.
Additionally:
We recommend using the
gpt2-layers
directory, which includes resid_pre layers 5-11, topk=30, 12288 features (the tokenized “t” ones have learned lookup tables, pre-initialized with unigram residuals).The folders
pareto-sweep
,init-sweep
, andexpansion-sweep
contain parameter sweeps, with lookup tables fixed to 2x unigram residuals.In addition to the code repo linked above, for now here is some quick code that loads the SAE, exposes the lookup table, and computes activations only: