TekhneMakre comments on A brainteaser for language models

TekhneMakre 12 Dec 2022 6:00 UTC
2 points
0
GPT gives the token “216” in the string “63 = 216“ a very low probability, just as low as “215” or “217”.
Replacing “63” with “62″ in the prompt still gives “216” as an output with ~10% probability.
Would the tokenizer behave differently given “216” and “2^16“, e.g. giving respectively the token “216” and some tokens like “**2” and “16*”? That would explain this as, GPT knows of course that 216 isn’t 63, but, it’s been forced to predict a relationship like “**2″ + “16*” = “**63*”.
- LawrenceC 12 Dec 2022 6:10 UTC
  3 points
  1
  Parent
  The Codex tokenizer used by the GPT-3.5 models tokenizes them differently: “216” is 1 token, “2^16“ is 3 (“2”, “^”, “16”). Note that ” 216″ (with a space) is a different token, and it’s what text-davinci-003 actually really wants to predict (you’ll often see 100x probability ratios between these two tokens).
  Here’s the log probs of the two sequences using Adam’s prompt above, with the trailing space removed (which is what he did in his actual setup, otherwise you get different probabilities):
  2 16 → −15.91
  2^16 → −1.34