The architecture shown for “Not in GPT” seems to be wrong? GPT is decoder only. The part labeled as “Not in GPT” is decoder part.
Current theme: default
Less Wrong (text)
Less Wrong (link)
The architecture shown for “Not in GPT” seems to be wrong? GPT is decoder only. The part labeled as “Not in GPT” is decoder part.