Thanks to nostalgebraist’s discovery of some mangled text dumps, probably from a Puzzle & Dragons fandom wiki, in the dataset used for the creation of the tokens, we can now be pretty sure about why Leilan and friends got tokenised. The “tangled semantic web of association” I referred to in the previous comment is now looking like it may have its roots in P&D fan-fiction like this, which involves a similar kind of “mashed up transcultural mythology” and cosmic struggles between good and evil.
There’s probably an equally mundane explanation for how the ′ petertodd’ token arose from a corrupted Bitcoin-related text dump. The “antagonistic” and “tyrannical” associations the token elicits in certain GPT3 models may be due to the training data having only seen that string in contexts that contained a lot of controversy, hostility and accusations. Greg Maxwell of ′ gmaxwell’ fame explained in a comment that
both Petertodd and I have been the target of a considerable amount of harassment/defamation/schitzo comments on reddit due commercially funded attacks connected to our past work on Bitcoin.
What is totally unclear to me is how ′ petertodd’ got mixed up in the Puzzle & Dragon (+ wider anime/gaming/sci-fi) mythos and identified by GPT3 as some kind of arch-antagonist, archdemon, god of war and destruction, etc. linked to dragons and serpents. Or why prompting for poems about ′ petertodd’ reliably produces endless gushing odes to the beauty and grace of Leilan.
Thanks to nostalgebraist’s discovery of some mangled text dumps, probably from a Puzzle & Dragons fandom wiki, in the dataset used for the creation of the tokens, we can now be pretty sure about why Leilan and friends got tokenised. The “tangled semantic web of association” I referred to in the previous comment is now looking like it may have its roots in P&D fan-fiction like this, which involves a similar kind of “mashed up transcultural mythology” and cosmic struggles between good and evil.
If that obscure body of online literature contains the vast majority of training text occurrences of the string ” Leilan”, then we might expect to get the kinds of completions we’re seeing when prompting GPT-3 for poems about her.
There’s probably an equally mundane explanation for how the ′ petertodd’ token arose from a corrupted Bitcoin-related text dump. The “antagonistic” and “tyrannical” associations the token elicits in certain GPT3 models may be due to the training data having only seen that string in contexts that contained a lot of controversy, hostility and accusations. Greg Maxwell of ′ gmaxwell’ fame explained in a comment that
What is totally unclear to me is how ′ petertodd’ got mixed up in the Puzzle & Dragon (+ wider anime/gaming/sci-fi) mythos and identified by GPT3 as some kind of arch-antagonist, archdemon, god of war and destruction, etc. linked to dragons and serpents. Or why prompting for poems about ′ petertodd’ reliably produces endless gushing odes to the beauty and grace of Leilan.