Charlie Steiner comments on How does GPT-3 spend its 175B parameters?

Charlie Steiner 14 Jan 2023 2:59 UTC
2 points
0
I believe “encoder” refers exclusively to the part of the model that reads in text to generate an internal representation
Architecturally, I think the big difference is bi-directional (BERT can use future tokens to influence latent features of current tokens) vs. uni-directional (GPT only flows information from past to future). You could totally use the “encoder” to generate text, or the “decoder” to generate latent representations used for another task, though perhaps they’re more suited for their typical roles.
EDIT: Whoops, was wrong in initial version of comment.