Fabien Roger comments on Preventing Language Models from hiding their reasoning

Fabien Roger 2 Nov 2023 14:52 UTC
5 points
1
Fully agree that there are strong theoretical arguments for CoT expressiveness. Thanks for the detailed references!
I think the big question is whether this expressiveness is required for anything we care about (e.g. the ability to take over), and how many serial steps are enough. (And here, I think that the number of serial steps in human reasoning is the best data point we have.).
Another question is whether CoT & natural-language are in practice able to take advantage of the increased number of serial steps: it does in some toy settings (coin flips count, …), but CoT barely improves performances on MMLU and common sense reasoning benchmarks. I think CoT will eventually matter a lot more than it does today, but it’s not completely obvious.
- Bogdan Ionut Cirstea 2 Nov 2023 15:15 UTC
  1 point
  0
  Parent
  I don’t know if I should be surprised by CoT not helping that much on MMLU; MMLU doesn’t seem to require [very] long chains of inference? In contrast, I expect takeover plans would. Somewhat related, my memory is that CoT seemed very useful for Theory of Mind (necessary for deception, which seems like an important component of many takeover plans), but the only reference I could find quickly is https://twitter.com/Shima_RM_/status/1651467500356538368.