One theory I haven’t seen in skimming some of the petertoddology out there:
There is an fairly prominent github user named petertodd associated with crypto, and the presence of this as a token in the tokenizer is almost certainly a result of him;
Crypto people tend to have their usernames sitting alongside varied crytographic hashes on the internet a lot;
Cryptographic hashes are extremely weird things for a transformer, because unlike a person a transformer can’t just skim past the block of text; instead they sit there furiously trying to predict the next token over and over again, filling up their context window one 4e and 6f at a time.
So some of the weird sinkhole features of this token could result from a machine that tries to reduce entropy on token sequences, encountering a token that tends to live in strings of extremely high entropy.
Another glitch token (SmartyHeaderCode) also often appears before cryptographic hashes, e.g.
<?php /*%%SmartyHeaderCode:12503048704fd0a835ee8ac4-90054934%%*/if(!defined('SMARTY_DIR')) exit('no direct access allowed');
Further support for this theory is that a verbatim google search for these two glitch tokens does bring up hashes, suggesting that this is a common association for these specific tokens.
One theory I haven’t seen in skimming some of the petertoddology out there:
There is an fairly prominent github user named petertodd associated with crypto, and the presence of this as a token in the tokenizer is almost certainly a result of him;
Crypto people tend to have their usernames sitting alongside varied crytographic hashes on the internet a lot;
Cryptographic hashes are extremely weird things for a transformer, because unlike a person a transformer can’t just skim past the block of text; instead they sit there furiously trying to predict the next token over and over again, filling up their context window one
4e
and6f
at a time.So some of the weird sinkhole features of this token could result from a machine that tries to reduce entropy on token sequences, encountering a token that tends to live in strings of extremely high entropy.
Another glitch token (
SmartyHeaderCode
) also often appears before cryptographic hashes, e.g.<?php /*%%SmartyHeaderCode:12503048704fd0a835ee8ac4-90054934%%*/if(!defined('SMARTY_DIR')) exit('no direct access allowed');
Further support for this theory is that a verbatim google search for these two glitch tokens does bring up hashes, suggesting that this is a common association for these specific tokens.