danwil comments on Tokenized SAEs: Infusing per-token biases.

danwil 9 Aug 2024 22:05 UTC
1 point
0
Additionally:
- We recommend using the gpt2-layers directory, which includes resid_pre layers 5-11, topk=30, 12288 features (the tokenized “t” ones have learned lookup tables, pre-initialized with unigram residuals).
- The folders pareto-sweep, init-sweep, and expansion-sweep contain parameter sweeps, with lookup tables fixed to 2x unigram residuals.
In addition to the code repo linked above, for now here is some quick code that loads the SAE, exposes the lookup table, and computes activations only:
```
import torch

SAE_BASE_PATH = 'gpt2-layers/gpt2_resid_pre'                   # hook_pt = 'resid_pre'

layer = 8
state_dict = torch.load(f'{SAE_BASE_PATH}_{layer}_ot30.pt')    # o=topk, t=tokenized, k=30
W_lookup = state_dict['lookup.W_lookup.weight']                # per-token decoder biases

def tokenized_activations(state_dict, x):
    'Return pre-relu activations. For simplicity, does not limit to topk.'
    return (x - state_dict['b_dec']) @ state_dict['W_enc'].T + state_dict['b_enc']
```