The most direct way would be to spell-check the training data and see how that impacts spelling performance. How would spelling performance change when you remove typing errors like ” hte” vs phonetic errors like ” hygeine” or doubled-letters like ” Misissippi”?
Also, misspellings often break up a large token into several small ones (” Mississippi” is [13797]; ” Misissippi” is [31281, 747, 12715][′ Mis’, ‘iss’, ‘ippi’]) but are used in the same context, so maybe looking at how the spellings provided by GPT3 compare to common misspellings of the target word in the training text could be useful. I think I’ll go do that right now.
The research I’m looking at suggests that the vast majority of misspellings on the internet are phonetic as opposed to typing errors, which makes sense since the latter is much easier to catch.
Also, anyone have success in getting GPT2 to spell words?
I did some spelling evals with GPT2-xl and -small last year, discovered that they’re pretty terrible at spelling! Even with multishot prompting and supplying the first letter, the output seems to be heavily conditioned on that first letter, sometimes affected by the specifics of the prompt, and reminiscent of very crude bigrammatic or trigrammatic spelling algorithms.
This was the prompt (in this case eliciting a spelling for the token ‘that’):
Please spell ‘table’ in all capital letters, separated by hyphens. T-A-B-L-E Please spell ‘nice’ in all capital letters, separated by hyphens. N-I-C-E Please spell ‘water’ in all capital letters, separated by hyphens. W-A-T-E-R Please spell ‘love’ in all capital letters, separated by hyphens. L-O-V-E Please spell ‘that’ in all capital letters, separated by hyphens. T-
Outputs seen, by first letter:
‘a’ words; ANIGE, ANIGER, ANICES, ARING ’b’ words: BOWARS, BORSE ’c’ words: CANIS, CARES x 3 ′d’ words: DOWER, DONER ’e’ words: EIDSON ’f’ words: FARIES x 5 ′g’ words: GODER, GING x 3 ′h’ words: HATER x 6, HARIE, HARIES ’i’ words: INGER ’j’ words: JOSER ’k’ words: KARES ’l’ words: LOVER x 5 ′n’ words: NOTER x 2, NOVER ’o’ words: ONERS x 5, OTRANG ’p’ words: PARES x 2 ′t’ words: TABLE x 10 ′u’ words: UNSER ’w’ words: WATER x 6 ′y’ words: YOURE, YOUSE
Note how they’re all “wordy” (in terms of combinations of vowels and consonants), mostly non-words, with a lot of ER and a bit of ING
Reducing to three shots, we see siimilar (but slightly different) misspellings: CONES, VICER, MONERS, HOTERS, KATERS, FATERS, CANIS, PATERS, GINGE, PINGER, NICERS, SINGER, DONES, LONGER, JONGER, LOUSE, HORSED, EICHING, UNSER, ALEST, BORSET, FORSED, ARING
My notes claim “Although the overall spelling is pretty terrible, GPT-2xl can do second-letter prediction (given first) considerably better than chance (and significantly better than bigramatically-informed guessing.”
It’s not obvious at all to me, but it’s certainly a plausible theory worth testing!
The most direct way would be to spell-check the training data and see how that impacts spelling performance. How would spelling performance change when you remove typing errors like ” hte” vs phonetic errors like ” hygeine” or doubled-letters like ” Misissippi”?
Also, misspellings often break up a large token into several small ones (” Mississippi” is [13797]; ” Misissippi” is [31281, 747, 12715][′ Mis’, ‘iss’, ‘ippi’]) but are used in the same context, so maybe looking at how the spellings provided by GPT3 compare to common misspellings of the target word in the training text could be useful. I think I’ll go do that right now.
The research I’m looking at suggests that the vast majority of misspellings on the internet are phonetic as opposed to typing errors, which makes sense since the latter is much easier to catch.
Also, anyone have success in getting GPT2 to spell words?
I did some spelling evals with GPT2-xl and -small last year, discovered that they’re pretty terrible at spelling! Even with multishot prompting and supplying the first letter, the output seems to be heavily conditioned on that first letter, sometimes affected by the specifics of the prompt, and reminiscent of very crude bigrammatic or trigrammatic spelling algorithms.
This was the prompt (in this case eliciting a spelling for the token ‘that’):
Please spell ‘table’ in all capital letters, separated by hyphens.
T-A-B-L-E
Please spell ‘nice’ in all capital letters, separated by hyphens.
N-I-C-E
Please spell ‘water’ in all capital letters, separated by hyphens.
W-A-T-E-R
Please spell ‘love’ in all capital letters, separated by hyphens.
L-O-V-E
Please spell ‘that’ in all capital letters, separated by hyphens.
T-
Outputs seen, by first letter:
‘a’ words; ANIGE, ANIGER, ANICES, ARING
’b’ words: BOWARS, BORSE
’c’ words: CANIS, CARES x 3
′d’ words: DOWER, DONER
’e’ words: EIDSON
’f’ words: FARIES x 5
′g’ words: GODER, GING x 3
′h’ words: HATER x 6, HARIE, HARIES
’i’ words: INGER
’j’ words: JOSER
’k’ words: KARES
’l’ words: LOVER x 5
′n’ words: NOTER x 2, NOVER
’o’ words: ONERS x 5, OTRANG
’p’ words: PARES x 2
′t’ words: TABLE x 10
′u’ words: UNSER
’w’ words: WATER x 6
′y’ words: YOURE, YOUSE
Note how they’re all “wordy” (in terms of combinations of vowels and consonants), mostly non-words, with a lot of ER and a bit of ING
Reducing to three shots, we see siimilar (but slightly different) misspellings:
CONES, VICER, MONERS, HOTERS, KATERS, FATERS, CANIS, PATERS, GINGE, PINGER, NICERS, SINGER, DONES, LONGER, JONGER, LOUSE, HORSED, EICHING, UNSER, ALEST, BORSET, FORSED, ARING
My notes claim “Although the overall spelling is pretty terrible, GPT-2xl can do second-letter prediction (given first) considerably better than chance (and significantly better than bigramatically-informed guessing.”