bvbvbvbvbvbvbvbvbvbvbv comments on How does GPT-3 spend its 175B parameters?

bvbvbvbvbvbvbvbvbvbvbv 19 Oct 2023 14:29 UTC
1 point
0
1. The only difference between encoder and decoder transformers is the attention mask. In an encoder, future tokens can attend to past tokens (acausal), while in a decoder, future tokens cannot attend to past tokens (causal attention). The term “decoder” is used because decoders can be used to generate text, while encoders cannot (since you can only run an encoder if you know the full input already).
This was very helpful to me. Thank you.