They are indeed all hook_resid_pre. The code you’re looking at just lists a set of positions that we are interested in viewing the reconstruction error of during evaluation. In particular, we want to view the reconstruction error at hook_resid_post of every layer, including the final layer (which you can’t get from hook_resid_pre).
What is the activation name for the resid SAEs? hook_resid_post or hook_resid_pre?
I found https://github.com/ApolloResearch/e2e_sae/blob/main/e2e_sae/scripts/train_tlens_saes/run_train_tlens_saes.py#L220
to suggest _post
but downloading the SAETransformer from wandb shows:
(saes):
ModuleDict( (blocks-6-hook_resid_pre):
SAE( (encoder): Sequential( (0):...
which suggests _pre.
They are indeed all hook_resid_pre. The code you’re looking at just lists a set of positions that we are interested in viewing the reconstruction error of during evaluation. In particular, we want to view the reconstruction error at hook_resid_post of every layer, including the final layer (which you can’t get from hook_resid_pre).