shiney comments on LLMs are (mostly) not helped by filler tokens

shiney 10 Aug 2023 14:27 UTC
3 points
0
This is interesting, its a pity you aren’t seeing results at all with this except with GPT4 because if you were doing so with an easier to manipulate model I’d suggest you could try snapping the activations on the filler tokens from one question to another and see if that reduced performance.
- Kshitij Sachan 10 Aug 2023 16:35 UTC
  2 points
  0
  Parent
  Yep I had considered doing that. Sadly, if resample ablations on the filler tokens reduced performance, that doesn’t necessarily imply that the filler tokens are being used for extra computation. For example, the model could just copy the relevant details from the problem into the filler token positions and solve it there.
  - shiney 10 Aug 2023 17:21 UTC
    1 point
    1
    Parent
    Oh hmm that’s very clever and I don’t know how I’d improve the method to avoid this.