Very cool to see new people joining the interpretability field!
Some resource suggestions:
If you didn’t know already, there is a TF2 port of Lucid, called Luna:
There is also Lucent, which is Lucid for PyTorch: (Some docs written by me for a slightly different version)
For transformer interpretability you might want to check out Anthropic’s work on transformer circuits, Redwood Research’s interpretability tool, or (shameless plug) Unseal.
Very cool to see new people joining the interpretability field!
Some resource suggestions:
If you didn’t know already, there is a TF2 port of Lucid, called Luna:
There is also Lucent, which is Lucid for PyTorch: (Some docs written by me for a slightly different version)
For transformer interpretability you might want to check out Anthropic’s work on transformer circuits, Redwood Research’s interpretability tool, or (shameless plug) Unseal.