Slimepriestess comments on SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4

Slimepriestess 16 Apr 2023 17:41 UTC
3 points
0
While looking at the end of the token list for anomalous tokens seems like a good place to start, the ” petertodd” token was actually at about ³⁄₄ of the way through the tokens (37,444 on the 50k model --> 74,888 on the 100k model, approximately), if the existence of anomalous tokens follows a similar “typology” regardless of the tokenizer used, then the locations of those tokens in the overall list might correlate in meaningful ways. Maybe worth looking into.