gwern comments on [AN #102]: Meta learning by GPT-3, and a list of full proposals for AI alignment

gwern 4 Jun 2020 23:57 UTC
14 points

Or is it learning how to imitate doing addition? (And the training set had people making those mistakes so it copies them.)

The arithmetic section says that they checked the corpus and found that most of the tested arithmetic they tried has no representations in the corpus:

Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized.

If it’s not memorizing/learning from explicit examples (but mostly from numbers used in normal ways and analytic writing), there can hardly be many explicit examples of simple arithmetic which are wrong, either, so ‘imitating errors’ seems unlikely.
- gwern 13 Jul 2020 18:17 UTC
  12 points
  Parent
  
  One critique is that GPT-3 still takes far too long to “identify” a task—why does it need 50 examples of addition in order to figure out that what it should do is addition? Why isn’t 1 sufficient? It’s not like there are a bunch of other conceptions of “addition” that need to be disambiguated
  
  To update on this: I think that actually, there are a lot of ‘additions’ which are ambiguous in GPT-3, because of the bizarreness of the BPE representation of numbers. I’ve discussed how empirically arithmetic & other tasks improve when avoiding BPEs, but perhaps more useful is to look at the task of ‘addition’ on BPEs directly. Nostalgebraist helpfully provides some tables of BPEs/numbers: https://nostalgebraist.tumblr.com/post/620663843893493761/bpe-blues
  
  Check it out: 2500 is the four-digit chunk ’2500′. 3500 is the digits ‘35’ followed by the digits ‘00’. And 4500 is the digit ‘4’ followed by the digits ‘500’.
  
  As we head further into 4-digit numerals, we start seeing 3-chunk ones eventually. The first 3-chunk numeral is (place your bets…) 4761 = “ 4″ + “76″ + “1″ (did you guess it?). The next is 4791, then 4861, 4862, 4863, then 4881, and so on in another inscrutable integer sequence.
  
  Unlike 2-chunking, though, 3-chunking is consistent about where to split. It’s always first digit + middle two + last digit. This holds across the whole range from the 4761, the first 4-digit / 3-chunk number, to 9984, the last 4-digit / 3-chunk number. Among 4-digit numbers overall, 2.0% are 1 chunk, 95.7% are 2 chunks, and 2.4% are 3 chunks.
  
  … got that?
  
  GPT-3 isn’t learning ‘arithmetic’, it’s learning ‘arithmetic on 2-chunking’, ‘arithmetic on 2-chunking when overflowing to 3-chunking’, ‘arithmetic on 3-chunking’, ‘arithmetic on 3-chunking overflowing to 1-3 chunks’...