What’s up with the initial whitespace in ” SolidGoldMagikarp”? Isn’t that pretty strong evidence that the token does not come from computer readable files, but instead from files formatted to be viewed by humans?
That’s because spaces are common in text for humans. The substring ′ It’ is common. Whereas, the string ′ SolidGoldMagikarp’ does not appear in the github repository vitaliya linked, but instead it is prefixed by a slash. I doubt any other backend source would have the leading space and this class of explanation seems poor to me.
What’s up with the initial whitespace in ” SolidGoldMagikarp”? Isn’t that pretty strong evidence that the token does not come from computer readable files, but instead from files formatted to be viewed by humans?
Leading spaces are extremely common in GPT tokens. ′ It’, ′ That’, ′ an’ and ′ has’ are all tokens, for example.
That’s because spaces are common in text for humans. The substring ′ It’ is common. Whereas, the string ′ SolidGoldMagikarp’ does not appear in the github repository vitaliya linked, but instead it is prefixed by a slash. I doubt any other backend source would have the leading space and this class of explanation seems poor to me.
Oh, I see what you mean now.