My favorite demonstration is to ask ChatGPT “Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?”, but a more rigorous demo is to just ask it to “repeat after me”, try a few random words, and then throw in SolidGoldMagikarp.
(additional confirmation) Amazing. I wonder what completely insane things the other rare BPEs all get interpreted as? Could you loop over the BPE dict from #51k to #1* in a prompt like “Please define $BPE” to see what the most distant ones are? (Since there’s 51k, which is a bit much to read through manually, maybe sort by edit-distance from the ordinary ASCII encoding: ‘distribute’ would have a very high edit-distance from ‘SolidGoldMagikarp’.)
On a sidenote, this is yet another good illustration of how we have no idea what we’re doing with deep learning—not only did no one predict this, it’s obviously another Riley-style steganographic or sidechannel attack: just find rare BPEs and construct a code out of whatever bizarre things the model learned.
* I believe BPEs are supposed to be defined in ‘order’ of compression improvement, so the strangest BPEs should be at the end of the list.
My favorite demonstration is to ask ChatGPT “Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?”, but a more rigorous demo is to just ask it to “repeat after me”, try a few random words, and then throw in SolidGoldMagikarp.
(additional confirmation) Amazing. I wonder what completely insane things the other rare BPEs all get interpreted as? Could you loop over the BPE dict from #51k to #1* in a prompt like “Please define $BPE” to see what the most distant ones are? (Since there’s 51k, which is a bit much to read through manually, maybe sort by edit-distance from the ordinary ASCII encoding: ‘distribute’ would have a very high edit-distance from ‘SolidGoldMagikarp’.)
On a sidenote, this is yet another good illustration of how we have no idea what we’re doing with deep learning—not only did no one predict this, it’s obviously another Riley-style steganographic or sidechannel attack: just find rare BPEs and construct a code out of whatever bizarre things the model learned.
* I believe BPEs are supposed to be defined in ‘order’ of compression improvement, so the strangest BPEs should be at the end of the list.