Thanks, the first diagram worked just as suggested: I have enough exposure to transformer internals that a few minutes of staring was enough to understand the algorithm. I’d always wondered why it is that GPT is so strangely good at repetition, and now it makes perfect sense.
Thanks, the first diagram worked just as suggested: I have enough exposure to transformer internals that a few minutes of staring was enough to understand the algorithm. I’d always wondered why it is that GPT is so strangely good at repetition, and now it makes perfect sense.
Awesome, really glad to hear it was helpful, thanks for commenting!