huh interesting! Who else has also run filler token experiments?
I was also interested in this experiment because it seemed like a crude way to measure how non-myopic are LLMs (i.e. what fraction of the forward pass is devoted to current vs future tokens). I wonder if other people were mostly coming at it from that angle.
Meta note: I find it somewhat of interest that filler token experiments have been independently conceived at least 5 times just to my knowledge.
huh interesting! Who else has also run filler token experiments?
I was also interested in this experiment because it seemed like a crude way to measure how non-myopic are LLMs (i.e. what fraction of the forward pass is devoted to current vs future tokens). I wonder if other people were mostly coming at it from that angle.
I’m currently working on filler token training experiments in small models. These GPT-4 results are cool! I’d be interested to chat.
Neat! I’ll reach out