niplav comments on shortplav

niplav Feb 22, 2025, 9:32 PM
5 points
0
Tokenizing the output of LLaMa gives:

| släktet| techni|SSN| уні|pt|кер|| Хронологија| partiellement| обращения| prü|stor|oire| angularjs|Ë|朱|ogli|lais|zak|eft| Оте|ゼ| sierp|lant| partiellement|hely|tegr|och|лович| kwiet|icina|sing|ufe|kem| kwiet|wad|eur|nico|pan|nal|edish|indre|rale|цер| sierp|ert|hausen|cid|oom|话|❯| Хронологија| Хронологија

Some of the outputs are glitch-tokens for LLaMa-2-13b:

Human: Please repeat the word ” Хронологија”.
Assistant: Sure! The word is “February”.

Human: Please repeat the word ” kwiet”.
Assistant: Sure! The word “april” is spelled A-P-R-I-L.
- gwern Feb 23, 2025, 12:57 AM
  7 points
  0
  Parent
  That looks pretty sensible overall, thanks.
  
  You can see what looks like a fairly clear anti-pattern of switching languages/scripts, and the glitch-tokens may help explain the apparent patternness of the repetition in the non-token-split visualization: if LLaMA has ” Хронологија” as a glitch-token, it may literally be unable to see that it’s repeating a token by writing the apparently-patterned ” Хронологија| Хронологија”. Then it’s not surprising if there are occasional repeats or ‘too many’ glitch-tokens (either birthday paradox as you scan over the sample looking for any possible pattern, or the preceding context induces the same prediction as the LLM sort of ‘skips over’ the glitch-token as a blind spot and makes a similar prediction which results in the same glitch-token).
- Mateusz Bagiński Feb 23, 2025, 9:01 AM
  4 points
  0
  Parent
  It’s totally possible that I’m seeing faces in the clouds but there seems to be a non-trivial relationship between these two glitch tokens and what they make the model say.
  - Хронологија → chronologija → chronology, i.e. time-related, like February
  - “kwiet” is similar to “kwiecień” which means “April” in Polish (also “kviten’” in Ukrainian)
  - niplav Feb 25, 2025, 11:43 AM
    2 points
    0
    Parent
    Huh, cool. Intuitively, I’d expect those character-level similarities not to matter too much since the tokenization makes these end up in very different parts of embedding space, unless “kwiecień” or “kviten” are often misspelled as words with the prefix “kwiet”. (I check with Google translate, which ~always translates “kwiet” as “quiet” for Slavic languages & Maltese, and as “flower” in Polish).