The repo now has support for ghost grads. h/t g-w1 for submitting a PR for this
ActivationBuffers now work natively with model components—like the residual stream—whose activations are typically returned as tuples; the buffer knows to take the first component of the tuple (and will iteratively do this if working with nested tuples).
ActivationBuffers can now be stored on the GPU.
The file evaluation.py contains code for evaluating trained dictionaries. I’ve found this pretty useful for quickly evaluating dictionaries people send to me.
New convenience: you can do reconstructed_acts, features = dictionary(acts, output_features=True) to get both the reconstruction and the features computed by dictionary.
Also, if you’d like to train dictionaries for many model components in parallel, you can use the parallel branch. I don’t promise to never make breaking changes to the parallel branch, sorry.
Finally, we’ve released a new set of dictionaries for the MLP outputs, attention outputs, and residual stream in all layers of Pythia-70m-deduped. The MLP and attention dictionaries seem pretty good, and the residual stream dictionaries seem like a mixed bag. Their stats can be found here.
Some updates about the dictionary_learning repo:
The repo now has support for ghost grads. h/t g-w1 for submitting a PR for this
ActivationBuffers
now work natively with model components—like the residual stream—whose activations are typically returned as tuples; the buffer knows to take the first component of the tuple (and will iteratively do this if working with nested tuples).ActivationBuffers
can now be stored on the GPU.The file
evaluation.py
contains code for evaluating trained dictionaries. I’ve found this pretty useful for quickly evaluating dictionaries people send to me.New convenience: you can do
reconstructed_acts, features = dictionary(acts, output_features=True)
to get both the reconstruction and the features computed bydictionary
.Also, if you’d like to train dictionaries for many model components in parallel, you can use the
parallel
branch. I don’t promise to never make breaking changes to theparallel
branch, sorry.Finally, we’ve released a new set of dictionaries for the MLP outputs, attention outputs, and residual stream in all layers of Pythia-70m-deduped. The MLP and attention dictionaries seem pretty good, and the residual stream dictionaries seem like a mixed bag. Their stats can be found here.