A very interesting post, thank you! I love these glitch tokens and agree that the fact that models can spell at all is really remarkable. I think there must be some very clever circuits that infer the spelling of words from the occasional typos and the like in natural text (i.e. the same mechanism that makes it desirable to learn the spelling of tokens is probably what makes it possible), and figuring out how those circuits work would be fascinating.
One minor comment about the “normalized cumulative probability” metric that you introduced: won’t that metric favor really long and predictable-once-begun completions? Like, suppose there’s an extremely small but nonzero chance that the model chooses to spell out ” Kanye” by spelling out the entire Gettysburg Address. The first few letters of the Gettysburg Address will be very unlikely, but after that, every other letter will be very likely, resulting in a very high normalized cumulative probability on the whole completion, even though the completion as a whole is still super unlikely.
Yes, I realised that this was a downfall of n.c.p. It’s helpful for shorter rollouts, but once they get longer they can get into a kind of “probabilistic groove” which starts to unhelpfully inflate n.c.p. In mode collapse loops, n.c.p. tends to 1. So yeah, good observation.
A very interesting post, thank you! I love these glitch tokens and agree that the fact that models can spell at all is really remarkable. I think there must be some very clever circuits that infer the spelling of words from the occasional typos and the like in natural text (i.e. the same mechanism that makes it desirable to learn the spelling of tokens is probably what makes it possible), and figuring out how those circuits work would be fascinating.
One minor comment about the “normalized cumulative probability” metric that you introduced: won’t that metric favor really long and predictable-once-begun completions? Like, suppose there’s an extremely small but nonzero chance that the model chooses to spell out ” Kanye” by spelling out the entire Gettysburg Address. The first few letters of the Gettysburg Address will be very unlikely, but after that, every other letter will be very likely, resulting in a very high normalized cumulative probability on the whole completion, even though the completion as a whole is still super unlikely.
Yes, I realised that this was a downfall of n.c.p. It’s helpful for shorter rollouts, but once they get longer they can get into a kind of “probabilistic groove” which starts to unhelpfully inflate n.c.p. In mode collapse loops, n.c.p. tends to 1. So yeah, good observation.