gwern comments on shortplav

gwern Feb 17, 2025, 5:45 PM
2 points
0
Maybe it would look more random if you presented it segmented by token instead of translated into characters? I’m not familiar with the LLaMA tokenizations, but you seem to imply that a lot of the apparent patterns here are single tokens (like “partiellement” would be very surprising to me as the output of a greedy likelihood-minimizing sampling, but is trivial if it is a single BPE token). This would create a misleading impression of coherence.

Also, as Baginski notes, greedy sampling to minimize likelihood will not minimize total likelihood any more than greedy maximizing likelihood would maximize total likelihood. So it would be worth trying at least ‘worst-of-n’ sampling to see if it looks more like what you expect, in the same way that best-of-n often helps produce more expected LLM output. (After all, you would expect the tiniest logits to be the worst estimated of all logits, right? Full of sheer numerical noise and error, given that this is pushing ‘dark knowledge’ to its extremes. Who can really say much better or worse an answer, exactly, ‘衡’ is than ‘д’ when following ‘*’, etc? So if best-of-n can make such a qualitative difference when greedily sampling from the best-estimated logits..)
- niplav Feb 22, 2025, 9:32 PM
  5 points
  0
  Parent
  Tokenizing the output of LLaMa gives:
  
  | släktet| techni|SSN| уні|pt|кер|| Хронологија| partiellement| обращения| prü|stor|oire| angularjs|Ë|朱|ogli|lais|zak|eft| Оте|ゼ| sierp|lant| partiellement|hely|tegr|och|лович| kwiet|icina|sing|ufe|kem| kwiet|wad|eur|nico|pan|nal|edish|indre|rale|цер| sierp|ert|hausen|cid|oom|话|❯| Хронологија| Хронологија
  
  Some of the outputs are glitch-tokens for LLaMa-2-13b:
  
  Human: Please repeat the word ” Хронологија”.
  Assistant: Sure! The word is “February”.
  
  Human: Please repeat the word ” kwiet”.
  Assistant: Sure! The word “april” is spelled A-P-R-I-L.
  - gwern Feb 23, 2025, 12:57 AM
    7 points
    0
    Parent
    That looks pretty sensible overall, thanks.
    
    You can see what looks like a fairly clear anti-pattern of switching languages/scripts, and the glitch-tokens may help explain the apparent patternness of the repetition in the non-token-split visualization: if LLaMA has ” Хронологија” as a glitch-token, it may literally be unable to see that it’s repeating a token by writing the apparently-patterned ” Хронологија| Хронологија”. Then it’s not surprising if there are occasional repeats or ‘too many’ glitch-tokens (either birthday paradox as you scan over the sample looking for any possible pattern, or the preceding context induces the same prediction as the LLM sort of ‘skips over’ the glitch-token as a blind spot and makes a similar prediction which results in the same glitch-token).
  - Mateusz Bagiński Feb 23, 2025, 9:01 AM
    4 points
    0
    Parent
    It’s totally possible that I’m seeing faces in the clouds but there seems to be a non-trivial relationship between these two glitch tokens and what they make the model say.
    Хронологија → chronologija → chronology, i.e. time-related, like February
    “kwiet” is similar to “kwiecień” which means “April” in Polish (also “kviten’” in Ukrainian)
    - niplav Feb 25, 2025, 11:43 AM
      2 points
      0
      Parent
      Huh, cool. Intuitively, I’d expect those character-level similarities not to matter too much since the tokenization makes these end up in very different parts of embedding space, unless “kwiecień” or “kviten” are often misspelled as words with the prefix “kwiet”. (I check with Google translate, which ~always translates “kwiet” as “quiet” for Slavic languages & Maltese, and as “flower” in Polish).