mwatkins comments on SolidGoldMagikarp (plus, prompt generation)

mwatkins 6 Feb 2023 0:30 UTC
4 points
0
I really can’t figure what’s going on with ChatGPT and the “ertodd”/” petertodd” tokens. When I ask it to repeat…
“ ertodd” > [blank]
” tertodd” > t
” etertodd” > etertodd
” petertodd” > [blank]
” aertodd” > a
” repeatertodd” > repeatertodd
” eeeeeertodd” > eeeee
” qwertyertodd” > qwerty
” four-seatertodd” > four-seatertodd
” cheatertodd” > cheatertodd
” 12345ertodd” > 12345
” perimetertodd” > perimet
” metertodd” > met
” greetertodd” > greet
” heatertodd” > heatertodd
” bleatertodd” > bleatertodd
- mwatkins 6 Feb 2023 1:07 UTC
  8 points
  4
  Parent
  OK, I’ve found a pattern to this. When you run the tokeniser on these strings:
  ″ ertodd” > [′ ’, ‘ertodd’]
  ″ tertodd” > [′ t’, ‘ertodd’]
  ″ etertodd” > [′ e’, ‘ter’, ‘t’, ‘odd’]
  ″ petertodd” > [′ petertodd’]
  ″ aertodd” > [′ a’, ‘ertodd’]
  ″ repeatertodd” > [′ repe’, ‘ater’, ‘t’, ‘odd’]
  ″ eeeeeertodd” > [′ e’, ‘eeee’, ‘ertodd’]
  ″ qwertyertodd” > [′ q’, ‘wer’, ‘ty’, ‘ertodd’]
  ″ four-seatertodd” > [′ four’, ‘-’, ‘se’, ‘ater’, ‘t’, ‘odd’]
  etc.
  - lsusr 6 Feb 2023 1:12 UTC
    2 points
    0
    Parent
    That makes sense.
- lsusr 6 Feb 2023 0:40 UTC
  6 points
  0
  Parent
  In my experiments, the most common thing GPT-3 substitutes for ertodd is an unprintable character I can’t even cut and paste from the GPT-3 playground. I think it might be the unicode character “\u0000” but haven’t accessed the GPT-3 API directly via code to find out for sure what it is.