Why is beam search missing? One possibility is that GPT-3 already does internal lookahead. OpenAI tried beam search, found it didn’t improve text generation, and didn’t bother adding it as an option. In other words, GPT-3 is already mesa-optimizing 😲
Beam search has never worked for likelihood-trained NNs, since at least char-RNNs back in 2015. Beam search does trigger repetition and other pathologies in GPT, see “The Curious Case of Neural Text Degeneration”, Holtzman et al 2019. And while unlikelihood training seems to help, it’s not a silver bullet, and is a bit ad hoc (especially if you think of it in terms of reinforcement learning).
It still is, it’s just that beam search (or other search strategies) seem to be mostly useful for closed-end short text generation; translating a sentence apparently is a task with enough of a right-or-wrong-ness to it that beam search apparently taps into no pathologies. But they get exposed for open-ended longform generation.
Beam search has never worked for likelihood-trained NNs, since at least char-RNNs back in 2015. Beam search does trigger repetition and other pathologies in GPT, see “The Curious Case of Neural Text Degeneration”, Holtzman et al 2019. And while unlikelihood training seems to help, it’s not a silver bullet, and is a bit ad hoc (especially if you think of it in terms of reinforcement learning).
Seq2seq used beam search and found it helped (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43155.pdf). It was standard practice in the early days of NMT; I’m not sure when that changed.
This blog post gives some insight into why beam search might not be a good idea, and is generally very interesting: https://benanne.github.io/2020/09/01/typicality.html
It still is, it’s just that beam search (or other search strategies) seem to be mostly useful for closed-end short text generation; translating a sentence apparently is a task with enough of a right-or-wrong-ness to it that beam search apparently taps into no pathologies. But they get exposed for open-ended longform generation.