jacopo comments on Who models the models that model models? An exploration of GPT-3′s in-context model fitting ability

jacopo 9 Jun 2022 16:55 UTC
4 points
I am left wondering if when GPT3 does few-shot arithmetics, it is actually fitting a linear model on the examples to predict the next token. I.e. the GPT3 weights do not “know” arithmetics, but they know how to fit, and that’s why they need a few examples before they can tell you the answer to 25+17: they need to know what function of 25 and 17 to return.

It is not that crazy given my understanding of what a transformer does, which is in some sense returning a function of the most recent input which depends on earlier inputs. Or am I confusing them with a different NN design?
- jacopo 9 Jun 2022 17:07 UTC
  2 points
  Parent
  Note that there could still be some priors on some functions being more probable, or some more complex case being plainly impossible to fit because there’s no way to get there from the meta-model that is the trained NN.