Adam Shai comments on Testing which LLM architectures can do hidden serial reasoning

Adam Shai 16 Dec 2024 22:08 UTC
3 points
0
Thanks. I really like this task!
It’s hard for me to interpret these results without some indication of how good these networks actually are at the task though. E.g. it is possible that even though a network could solve a length=N task once out of however many attempts you made, that it just got lucky, or is running some other heuristic that just happens to work for that one time. I understand why you were interested in how things scale with length of problem given your interest in recurrence and processing depth. But would it be hard to make a plot where x axis is length of problem, and y axis is accuracy or loss?
- Filip Sondej 17 Dec 2024 19:21 UTC
  3 points
  0
  Parent
  Yup, here is such a plot, made after training “switcher” architecture for 350k examples. I remember it was similar for the longer training—a few longest task lengths struggle, but the rest is near 100%.