dirk comments on LLM Generality is a Timeline Crux

dirk 25 Jun 2024 6:13 UTC
9 points
4
Their website cites https://cims.nyu.edu/~brenden/papers/JohnsonEtAl2021CogSci.pdf as having found an average 84% success rate on the tested subset of puzzles.
- ryan_greenblatt 25 Jun 2024 21:11 UTC
  7 points
  2
  Parent
  It is worth noting that LLM based approachs can perform reasonably well on the train set. For instance, my approach gets 72%.
  
  The LLM based approach works quite differently from how a human would normally solve the problem, and if you give LLMs “only one attempt” or otherwise limit them to do a qualitatively similar amount of reasoning as with humans I think they do considerably worse than humans. (Though to make this “only one attempt” baseline fair, you have to allow for the iteration that humans would normally do.)
- YafahEdelman 27 Jun 2024 20:36 UTC
  1 point
  0
  Parent
  Yeah, I failed to mention this. Edited to clarify what I meant.
- eggsyntax 25 Jun 2024 7:15 UTC
  1 point
  0
  Parent
  Thanks for finding a cite. I’ve definitely seen Chollet (on Twitter) give 85% as the success rate on the (easier) training set (and the paper picks problems from the training set as well).