tdooms comments on Tokenized SAEs: Infusing per-token biases.

tdooms Aug 8, 2024, 9:32 PM
4 points
0
We used a google drive repo where we stored most of the runs (https://drive.google.com/drive/folders/1ERSkdA_yxr7ky6AItzyst-tCtfUPy66j?usp=sharing). We use a somewhat weird naming scheme, if there is a “t” in the postfix of the name, it is tokenized. Some may be old and may not fully work, if you run into any issues, feel free to reach out.

The code in the research repo (specifically https://github.com/tdooms/tokenized-sae/blob/main/base.py#L119) should work to load them in.

Please keep in mind that these are currently more of a proof of concept and are likely undertrained. We were hoping to determine the level of interest in this technique before training a proper suite.
- danwil Aug 9, 2024, 10:05 PM
  1 point
  0
  Parent
  Additionally:
  - We recommend using the gpt2-layers directory, which includes resid_pre layers 5-11, topk=30, 12288 features (the tokenized “t” ones have learned lookup tables, pre-initialized with unigram residuals).
  - The folders pareto-sweep, init-sweep, and expansion-sweep contain parameter sweeps, with lookup tables fixed to 2x unigram residuals.
  In addition to the code repo linked above, for now here is some quick code that loads the SAE, exposes the lookup table, and computes activations only:
```
import torch

SAE_BASE_PATH = 'gpt2-layers/gpt2_resid_pre'                   # hook_pt = 'resid_pre'

layer = 8
state_dict = torch.load(f'{SAE_BASE_PATH}_{layer}_ot30.pt')    # o=topk, t=tokenized, k=30
W_lookup = state_dict['lookup.W_lookup.weight']                # per-token decoder biases

def tokenized_activations(state_dict, x):
    'Return pre-relu activations. For simplicity, does not limit to topk.'
    return (x - state_dict['b_dec']) @ state_dict['W_enc'].T + state_dict['b_enc']
```