(Not related to the overall point of your paper) I’m not so sure that GPT-3 “has the internal model to do addition,” depending on what you mean by that — nostalgebraist doesn’t seem to think so in this post, and a priori this seems like a surprising thing for a feedforward neural network to do.
I’m pretty sure it can’t do long addition—I played around with that specifically—but it single- or double-digit addition well enough that it at least has some idea of what we’re gesturing at.
(Not related to the overall point of your paper) I’m not so sure that GPT-3 “has the internal model to do addition,” depending on what you mean by that — nostalgebraist doesn’t seem to think so in this post, and a priori this seems like a surprising thing for a feedforward neural network to do.
I’m pretty sure it can’t do long addition—I played around with that specifically—but it single- or double-digit addition well enough that it at least has some idea of what we’re gesturing at.