it feels less surgical than a single direction everywher
Agreed, it seems less elegant, But one guy on huggingface did a rough plot the cross correlation, and it seems to show that the directions changes with layer https://huggingface.co/posts/Undi95/318385306588047#663744f79522541bd971c919. Although perhaps we are missing something.
Note that you can just do torch.save(FILE_PATH, model.state_dict()) as with any PyTorch model.
omg, I totally missed that, thanks. Let me know if I missed anything else, I just want to learn.
The older versions of the gist are in transformerlens, if anyone wants those versions. In those the interventions work better since you can target resid_pre, redis_mid, etc.
Anecdotally, a lot of people are using mergekit to combine fine tunes