David Scott Krueger (formerly: capybaralet) comments on How LLMs are and are not myopic

David Scott Krueger (formerly: capybaralet) 26 Jul 2023 23:15 UTC
LW: 5 AF: 3
0
AF
This means that the model can and will implicitly sacrifice next-token prediction accuracy for long horizon prediction accuracy.
Are you claiming this would happen even given infinite capacity?
If so, can you perhaps provide a simple+intuitive+concrete example?
- Caspar Oesterheld 8 Dec 2023 20:30 UTC
  LW: 10 AF: 6
  4
  AF Parent
  
  This means that the model can and will implicitly sacrifice next-token prediction accuracy for long horizon prediction accuracy.
  
  Are you claiming this would happen even given infinite capacity?
  
  I think that janus isn’t claiming this and I also think it isn’t true. I think it’s all about capacity constraints. The claim as I understand it is that there are some intermediate computations that are optimized both for predicting the next token and for predicting the 20th token and that therefore have to prioritize between these different predictions.
- David Johnston 27 Jul 2023 11:54 UTC
  3 points
  2
  Parent
  I can’t speak for janus, but my interpretation was that this is due to a capacity budget meaning it can be favourable to lose a bit of accuracy on token n if you gain more on n+m. I agree som examples would be great.