Good (but hard) exercise: Code your own tiny GPT-2 and train it. If you can do this, I’d say that you basically fully understand the transformer architecture.
Ah, thanks! Haven’t looked at this point in a while, updated it a bit. I’ve since made my own transformer tutorial which (in my extremely biased opinion) is better esp for interpretability. It comes with a template notebook to fill out alongside part 2, (with tests!) and by the end you’ll have implemented your own GPT-2.
More generally, my getting started in mech interp guide is a better place to start than this guide, and has more on transformers!
hey Neel,
Great post!
I am trying to look into the code here
Good (but hard) exercise: Code your own tiny GPT-2 and train it. If you can do this, I’d say that you basically fully understand the transformer architecture.
Example of basic training boilerplate and train script
The EasyTransformer codebase is probably good to riff off of here
But the links dont work anymore! It would be nice if you could help update them!
I dont know if this link works for the original content: https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob/clean-transformer-demo/Clean_Transformer_Demo_Template.ipynb
Thanks a lot!
Ah, thanks! Haven’t looked at this point in a while, updated it a bit. I’ve since made my own transformer tutorial which (in my extremely biased opinion) is better esp for interpretability. It comes with a template notebook to fill out alongside part 2, (with tests!) and by the end you’ll have implemented your own GPT-2.
More generally, my getting started in mech interp guide is a better place to start than this guide, and has more on transformers!