“In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table. Overall, GPT-3 displays reasonable proficiency at moderately complex arithmetic in few-shot, one-shot, and even zero-shot settings.” (emphasis mine)
Does this seem right? If so, is this impressive? It seems so to me; people often say “reasoning” is something current methods can’t do, and this is updating me more towards thinking that’s false.
I’m a bit confused about this as a piece of evidence—naively, it seems to me like not carrying the 1 would be a mistake that you would make if you had memorized the pattern for single-digit arithmetic and were just repeating it across the number. I’m not sure if this counts as “memorizing a table” or not.
Excellent point! Well, they do get the answer right some of the time… it would be interesting to see how often they “remember” to carry the one vs. how often they “forget.” It looks like the biggest model got basically 100% correct on 2-digit addition, so it seems that they mostly “remember.”
As abergal wrote, not carrying the “1” can simply mean it does digit-wise addition (which seems trivial via memorization). But notice that just before that quote they also write:
To spot-check whether the model is simply memorizing specific arithmetic problems, we took the 3-digit arithmetic problems in our test set and searched for them in our training data in both the forms “<NUM1> + <NUM2> =” and “<NUM1> plus <NUM2>”. Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized.
That seems like evidence against memorization, but maybe their simple search failed to find most cases with some relevant training signal, eg: “In this diet you get 350 calories during breakfast: 200 calories from X and 150 calories from Y.”
“In addition, inspection of incorrect answers reveals that the model often makes mistakes
such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than
memorizing a table. Overall, GPT-3 displays reasonable proficiency at moderately complex arithmetic in few-shot, one-shot, and even zero-shot settings.” (emphasis mine)
Does this seem right? If so, is this impressive? It seems so to me; people often say “reasoning” is something current methods can’t do, and this is updating me more towards thinking that’s false.
I’m a bit confused about this as a piece of evidence—naively, it seems to me like not carrying the 1 would be a mistake that you would make if you had memorized the pattern for single-digit arithmetic and were just repeating it across the number. I’m not sure if this counts as “memorizing a table” or not.
Excellent point! Well, they do get the answer right some of the time… it would be interesting to see how often they “remember” to carry the one vs. how often they “forget.” It looks like the biggest model got basically 100% correct on 2-digit addition, so it seems that they mostly “remember.”
But does it ever hallucinate the need to carry the one when it shouldn’t?
As abergal wrote, not carrying the “1” can simply mean it does digit-wise addition (which seems trivial via memorization). But notice that just before that quote they also write:
That seems like evidence against memorization, but maybe their simple search failed to find most cases with some relevant training signal, eg: “In this diet you get 350 calories during breakfast: 200 calories from X and 150 calories from Y.”