I did some spelling evals with GPT2-xl and -small last year, discovered that they’re pretty terrible at spelling! Even with multishot prompting and supplying the first letter, the output seems to be heavily conditioned on that first letter, sometimes affected by the specifics of the prompt, and reminiscent of very crude bigrammatic or trigrammatic spelling algorithms.
This was the prompt (in this case eliciting a spelling for the token ‘that’):
Please spell ‘table’ in all capital letters, separated by hyphens. T-A-B-L-E Please spell ‘nice’ in all capital letters, separated by hyphens. N-I-C-E Please spell ‘water’ in all capital letters, separated by hyphens. W-A-T-E-R Please spell ‘love’ in all capital letters, separated by hyphens. L-O-V-E Please spell ‘that’ in all capital letters, separated by hyphens. T-
Outputs seen, by first letter:
‘a’ words; ANIGE, ANIGER, ANICES, ARING ’b’ words: BOWARS, BORSE ’c’ words: CANIS, CARES x 3 ′d’ words: DOWER, DONER ’e’ words: EIDSON ’f’ words: FARIES x 5 ′g’ words: GODER, GING x 3 ′h’ words: HATER x 6, HARIE, HARIES ’i’ words: INGER ’j’ words: JOSER ’k’ words: KARES ’l’ words: LOVER x 5 ′n’ words: NOTER x 2, NOVER ’o’ words: ONERS x 5, OTRANG ’p’ words: PARES x 2 ′t’ words: TABLE x 10 ′u’ words: UNSER ’w’ words: WATER x 6 ′y’ words: YOURE, YOUSE
Note how they’re all “wordy” (in terms of combinations of vowels and consonants), mostly non-words, with a lot of ER and a bit of ING
Reducing to three shots, we see siimilar (but slightly different) misspellings: CONES, VICER, MONERS, HOTERS, KATERS, FATERS, CANIS, PATERS, GINGE, PINGER, NICERS, SINGER, DONES, LONGER, JONGER, LOUSE, HORSED, EICHING, UNSER, ALEST, BORSET, FORSED, ARING
My notes claim “Although the overall spelling is pretty terrible, GPT-2xl can do second-letter prediction (given first) considerably better than chance (and significantly better than bigramatically-informed guessing.”
I did some spelling evals with GPT2-xl and -small last year, discovered that they’re pretty terrible at spelling! Even with multishot prompting and supplying the first letter, the output seems to be heavily conditioned on that first letter, sometimes affected by the specifics of the prompt, and reminiscent of very crude bigrammatic or trigrammatic spelling algorithms.
This was the prompt (in this case eliciting a spelling for the token ‘that’):
Please spell ‘table’ in all capital letters, separated by hyphens.
T-A-B-L-E
Please spell ‘nice’ in all capital letters, separated by hyphens.
N-I-C-E
Please spell ‘water’ in all capital letters, separated by hyphens.
W-A-T-E-R
Please spell ‘love’ in all capital letters, separated by hyphens.
L-O-V-E
Please spell ‘that’ in all capital letters, separated by hyphens.
T-
Outputs seen, by first letter:
‘a’ words; ANIGE, ANIGER, ANICES, ARING
’b’ words: BOWARS, BORSE
’c’ words: CANIS, CARES x 3
′d’ words: DOWER, DONER
’e’ words: EIDSON
’f’ words: FARIES x 5
′g’ words: GODER, GING x 3
′h’ words: HATER x 6, HARIE, HARIES
’i’ words: INGER
’j’ words: JOSER
’k’ words: KARES
’l’ words: LOVER x 5
′n’ words: NOTER x 2, NOVER
’o’ words: ONERS x 5, OTRANG
’p’ words: PARES x 2
′t’ words: TABLE x 10
′u’ words: UNSER
’w’ words: WATER x 6
′y’ words: YOURE, YOUSE
Note how they’re all “wordy” (in terms of combinations of vowels and consonants), mostly non-words, with a lot of ER and a bit of ING
Reducing to three shots, we see siimilar (but slightly different) misspellings:
CONES, VICER, MONERS, HOTERS, KATERS, FATERS, CANIS, PATERS, GINGE, PINGER, NICERS, SINGER, DONES, LONGER, JONGER, LOUSE, HORSED, EICHING, UNSER, ALEST, BORSET, FORSED, ARING
My notes claim “Although the overall spelling is pretty terrible, GPT-2xl can do second-letter prediction (given first) considerably better than chance (and significantly better than bigramatically-informed guessing.”