True that it isn’t much evidence for reasoning directly, as it’s only 1 task.
As for how we can jump from the empirical result to make claims about it’s ability to reason, the reason is that the shift cipher task let’s us disentangle commonness and simplicity, where a bag of heuristics that has no uniform and compact description work best for common example types, whereas the algorithmic reasoning that I defined below would work better on simpler tasks, where the simplest shift cipher is 1-shift cipher, whereas the bag of heuristics model which predicts that LLMs are essentially learning shallow heuristics completely or primarily would work best on 13-shift ciphers, as that’s the most common, and the paper shows that there is a spike on the 13-shift cipher accuracy, consistent with LLMs having some heuristics, but also that the 1-shift cipher accuracy was much better than expected under a view that though LLMs were solely or primarily a bag of heuristics that couldn’t be improved by COT.
I’m defining reasoning more formally in the quote below:
So an “algorithm” is a finite description of a fast parallel circuit for every size.
I see, I think that second tweet thread actually made a lot more sense, thanks for sharing! McCoy’s definitions of heuristics and reasoning is sensible, although I personally would still avoid “reasoning” as a word since people probably have very different interpretations of what it means. I like the ideas of “memorizing solutions” and “generalizing solutions.”
I think where McCoy and I depart is that he’s modeling the entire network computation as a heuristic, while I’m modeling the network as compositions of bags of heuristics, which in aggregate would display behaviors he would call “reasoning.”
The explanation I gave above—heuristics that shifts the letter forward by one with limited composing abilities—is still a heuristics-based explanation. Maybe this set of composing heuristics would fit your definition of an “algorithm.” I don’t think there’s anything inherently wrong with that.
However, the heuristics based explanation gives concrete predictions of what we can look for in the actual network—individual heuristic that increments a to b, b to c, etc., and other parts of the network that compose the outputs.
This is what I meant when I said that this could be a useful framework for interpretability :)
Though I’d still claim that this is evidence towards the view that there is a generalizing solution that is implemented inside of LLMs, and I wanted people to keep that in mind, since people often treat heuristics as meaning that it doesn’t generalize at all.
since people often treat heuristics as meaning that it doesn’t generalize at all.
Yeah and I think that’s a big issue! I feel like what’s happening is that once you chain a huge number of heuristics together you can get behaviors that look a lot like complex reasoning.
True that it isn’t much evidence for reasoning directly, as it’s only 1 task.
As for how we can jump from the empirical result to make claims about it’s ability to reason, the reason is that the shift cipher task let’s us disentangle commonness and simplicity, where a bag of heuristics that has no uniform and compact description work best for common example types, whereas the algorithmic reasoning that I defined below would work better on simpler tasks, where the simplest shift cipher is 1-shift cipher, whereas the bag of heuristics model which predicts that LLMs are essentially learning shallow heuristics completely or primarily would work best on 13-shift ciphers, as that’s the most common, and the paper shows that there is a spike on the 13-shift cipher accuracy, consistent with LLMs having some heuristics, but also that the 1-shift cipher accuracy was much better than expected under a view that though LLMs were solely or primarily a bag of heuristics that couldn’t be improved by COT.
I’m defining reasoning more formally in the quote below:
This comment is where I got the quote from:
https://www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1#Bg5s8ujitFvfXuop8
This thread has an explanation of why we can disentangle noisy reasoning from heuristics, as I’m defining the terms here, so go check that out below:
https://x.com/RTomMcCoy/status/1843325666231755174
I see, I think that second tweet thread actually made a lot more sense, thanks for sharing!
McCoy’s definitions of heuristics and reasoning is sensible, although I personally would still avoid “reasoning” as a word since people probably have very different interpretations of what it means. I like the ideas of “memorizing solutions” and “generalizing solutions.”
I think where McCoy and I depart is that he’s modeling the entire network computation as a heuristic, while I’m modeling the network as compositions of bags of heuristics, which in aggregate would display behaviors he would call “reasoning.”
The explanation I gave above—heuristics that shifts the letter forward by one with limited composing abilities—is still a heuristics-based explanation. Maybe this set of composing heuristics would fit your definition of an “algorithm.” I don’t think there’s anything inherently wrong with that.
However, the heuristics based explanation gives concrete predictions of what we can look for in the actual network—individual heuristic that increments a to b, b to c, etc., and other parts of the network that compose the outputs.
This is what I meant when I said that this could be a useful framework for interpretability :)
Now I understand.
Though I’d still claim that this is evidence towards the view that there is a generalizing solution that is implemented inside of LLMs, and I wanted people to keep that in mind, since people often treat heuristics as meaning that it doesn’t generalize at all.
Yeah and I think that’s a big issue! I feel like what’s happening is that once you chain a huge number of heuristics together you can get behaviors that look a lot like complex reasoning.