George3d6 comments on MIRI comments on Cotra’s “Case for Aligning Narrowly Superhuman Models”

George3d6 11 Mar 2021 8:12 UTC
1 point
<retracted>
- Quintin Pope 11 Mar 2021 9:16 UTC
  5 points
  Parent
  GPT-3 (and most pretrained transformers) generate tokens, not words or characters. Sometimes, those tokens represent words and sometimes they represent single characters. More common words receive their own token, and less common words are broken into two or more tokens. The vocab is tuned to minimize avg. text length.
  - George3d6 11 Mar 2021 9:31 UTC
    1 point
    Parent
    Sometimes, those tokens represent words and sometimes they represent single characters.
    Hmh, ok, quick update to my knowledge that I should have done before: https://huggingface.co/transformers/tokenizer_summary.html
    Seems to indicate that GPT-2 uses a byte-level BPE, though maybe the impl here is wrong, where I’d have expected it to use something closer to a word-by-wrod tokenizer with exceptions for rare words (i.e. a sub-word tokenizer that’s basically acting as a word tokenizer 90% of the time). And maybe GPT-3 uses the same?
    Also it seems that sub-word tokenizer split much more aggressively than I’d have assumed before.
    Complaint retracted.