Thanks! Maybe we could get around the BPE encoding by doing it with sentences instead of words? Like, “Please scramble the word order in the following sentences: I ate a nice juicy steak. = steak nice a juicy ate I. Happy people usually sleep well at night. = people sleep well usually at night happy. Orange juice is like wine for amateurs. = ” Or would that be less impressive for some reason?
An issue there is that you would be eating into your context window further by expanding it out: each of those words is going to take 1 or more BPEs, while I’m at least reasonably sure that the letter by letter approach is at least guaranteed to be 1 letter = 1 BPE. You also make it more likely that the decoding of the answer will screw up—the more BPEs it takes to express an answer, the more likely the top-k or top-p sampling will stochastically screw up an otherwise-perfectly-obvious-correct answer. (You can see the stochasticity at play in the completions: “shame” vs “shames” eg.)
Thanks! Maybe we could get around the BPE encoding by doing it with sentences instead of words? Like, “Please scramble the word order in the following sentences: I ate a nice juicy steak. = steak nice a juicy ate I. Happy people usually sleep well at night. = people sleep well usually at night happy. Orange juice is like wine for amateurs. = ” Or would that be less impressive for some reason?
An issue there is that you would be eating into your context window further by expanding it out: each of those words is going to take 1 or more BPEs, while I’m at least reasonably sure that the letter by letter approach is at least guaranteed to be 1 letter = 1 BPE. You also make it more likely that the decoding of the answer will screw up—the more BPEs it takes to express an answer, the more likely the top-k or top-p sampling will stochastically screw up an otherwise-perfectly-obvious-correct answer. (You can see the stochasticity at play in the completions: “shame” vs “shames” eg.)