Similarly, I find that GPT-3, GPT-3.5, and Claude 2 don’t benefit from filler tokens. However, GPT-4 (which Tamera didn’t study) shows mixed results with strong improvements on some tasks and no improvement on others.
It’s interesting question whether Gemini has any improvements.
It’s interesting question whether Gemini has any improvements.