There has been some talk recently about long “filler-like” input (e.g. “a a a a a [...]”) somewhat derailing GPT3&4, e.g. leading them to output what seems like random parts of it’s training data. Maybe this effect is worth mentioning and thinking about when trying to use filler input for other purposes.
My assumption is that GPT-4 has a repetition penalty, so if you make it predict all the same phrase over and over again, it puts almost all its probability on a token that the repetition penalty prevents it from saying, with the leftover probability acting similarly to a max entropy distribution over the rest of the vocab.
There has been some talk recently about long “filler-like” input (e.g. “a a a a a [...]”) somewhat derailing GPT3&4, e.g. leading them to output what seems like random parts of it’s training data. Maybe this effect is worth mentioning and thinking about when trying to use filler input for other purposes.
My assumption is that GPT-4 has a repetition penalty, so if you make it predict all the same phrase over and over again, it puts almost all its probability on a token that the repetition penalty prevents it from saying, with the leftover probability acting similarly to a max entropy distribution over the rest of the vocab.
This happens with GPT-3.5 too, BTW.
By repetition penalty do you mean an explicit logit bias when sampling or internally it’s generalized to avoiding repeated tokens?