Kshitij Sachan comments on LLMs are (mostly) not helped by filler tokens

Kshitij Sachan 10 Aug 2023 16:35 UTC
2 points
0
Yep I had considered doing that. Sadly, if resample ablations on the filler tokens reduced performance, that doesn’t necessarily imply that the filler tokens are being used for extra computation. For example, the model could just copy the relevant details from the problem into the filler token positions and solve it there.
- shiney 10 Aug 2023 17:21 UTC
  1 point
  1
  Parent
  Oh hmm that’s very clever and I don’t know how I’d improve the method to avoid this.