For building the skills to make a transformer, I’d highly recommend Karpathy’s youtube channel. He hasn’t gotten to transformers yet, as he’s covering earlier models first. Which is useful, as knowing how to implement a neural network properly will affect your ability to implement a transformer. Yes, these are NLP models, but I think the soft rule of not looking at any NLP architectures is dumb. If the models don’t contain the core insights of transforrmers/SOTA NLP architectures, then what’s the issue?
To understand what a transformer is, I’d recommend this article. Also, I’d warn against not using pytorch for large models. Unless you know CUDA, that’s a bad idea.
EDIT: This is a good post, and I’m glad you (and your girlfriend?) wrote it.
For building the skills to make a transformer, I’d highly recommend Karpathy’s youtube channel. He hasn’t gotten to transformers yet, as he’s covering earlier models first. Which is useful, as knowing how to implement a neural network properly will affect your ability to implement a transformer. Yes, these are NLP models, but I think the soft rule of not looking at any NLP architectures is dumb. If the models don’t contain the core insights of transforrmers/SOTA NLP architectures, then what’s the issue?
To understand what a transformer is, I’d recommend this article. Also, I’d warn against not using pytorch for large models. Unless you know CUDA, that’s a bad idea.
EDIT: This is a good post, and I’m glad you (and your girlfriend?) wrote it.