Martin Fell comments on The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited

Martin Fell 1 Aug 2023 9:43 UTC
5 points
0
Note that there are glitch tokens in GPT3.5 and GPT4! The tokenizer was changed to a 100k vocabulary (rather than 50k) so all of the tokens are different, but they are there. Try ” ForCanBeConverted” as an example.
If I remember correctly, “davidjl” is the only old glitch token that carries over to the new tokenizer.
Apart from that, some lists have been created and there do exist a good selection.
- Mazianni 7 Aug 2023 17:03 UTC
  2 points
  0
  Parent
  Good find. What I find fascinating is the fairly consistent responses using certain tokens, and the lack of consistent response using other tokens. I observe that in a Bayesian network, the lack of consistent response would suggest that the network was uncertain, but consistency would indicate certainty. It makes me very curious how such ideas apply to the concept of Glitch tokens and the cause of the variability in response consistency.