While you’re here and chatting about D.5 (assume you meant 5), another tiny thing that confuses me—Figure 21. Am I right in reading the bottom two lines as ‘seeing 255 tokens and predicting the 256th is exactly as difficult as seeing 1023 tokens and predicting the 1024th’?
e: Another look and I realise Fig 20 shows things much more clearly—never mind, things continue to get easier with token index.
While you’re here and chatting about D.5 (assume you meant 5), another tiny thing that confuses me—Figure 21. Am I right in reading the bottom two lines as ‘seeing 255 tokens and predicting the 256th is exactly as difficult as seeing 1023 tokens and predicting the 1024th’?
e: Another look and I realise Fig 20 shows things much more clearly—never mind, things continue to get easier with token index.