Fabien Roger comments on Preventing Language Models from hiding their reasoning

Fabien Roger 31 Oct 2023 18:46 UTC
8 points
0
Thanks!
1. Not confident for current Transformers (CoT currently only help for a limited set of problems, and they don’t help as much as you would expect from the increased number of serial steps), more confident for future ones, mostly by analogy with the human brains which requires many serial steps to do anything scary (while doing a lot of parallel computation). Human neurons fire in about ~1ms, so a current Transformer does a forward pass in ~0.5s of human thinking-time, and since people scale the number of layers slowly, I’d be surprised if future Transformers did more than ~10s thinking-time forward passes.
2. True RNNs can do many serial steps and I would find that scary if they worked. RWKV are not true RNN, and they do roughly as many serial steps as regular Transformers (they are analogous to their “train mode”, which only uses as many serial steps and FLOPs are regular Transformers).
3. If the channel capacity of the hidden information through paraphraser is 3 bits/KB, then you can’t pass more information than that through it, no matter how you encode it. The main challenge is actually getting the capacity down (in particular, I think that better steganography schemes might get much more than 3 bits/KB through the context-aware paraphraser).
- Bogdan Ionut Cirstea 1 Nov 2023 17:37 UTC
  8 points
  6
  Parent
  I’ll note that there’s actually a lot of evidence (especially theoretical) on the need for scratchpad/CoT and how it leads to much higher expressivity, both for Transformers and (conjectured) more generally for any parallelizable architecture (crucial for efficient training); to the point that I think we should expect this to hold in the future too with significantly >50% probability, probably >90%. See e.g. The Parallelism Tradeoff: Limitations of Log-Precision Transformers, Auto-Regressive Next-Token Predictors are Universal Learners, Chain of Thought Empowers Transformers to Solve Inherently Serial Problems, The Expressive Power of Transformers with Chain of Thought, Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective.
  What links here?
  - Bogdan Ionut Cirstea's comment on Deep Forgetting & Unlearning for Safely-Scoped LLMs by scasper (5 Dec 2023 22:05 UTC; 7 points)
  - Fabien Roger 2 Nov 2023 14:52 UTC
    5 points
    1
    Parent
    Fully agree that there are strong theoretical arguments for CoT expressiveness. Thanks for the detailed references!
    I think the big question is whether this expressiveness is required for anything we care about (e.g. the ability to take over), and how many serial steps are enough. (And here, I think that the number of serial steps in human reasoning is the best data point we have.).
    Another question is whether CoT & natural-language are in practice able to take advantage of the increased number of serial steps: it does in some toy settings (coin flips count, …), but CoT barely improves performances on MMLU and common sense reasoning benchmarks. I think CoT will eventually matter a lot more than it does today, but it’s not completely obvious.
    - Bogdan Ionut Cirstea 2 Nov 2023 15:15 UTC
      1 point
      0
      Parent
      I don’t know if I should be surprised by CoT not helping that much on MMLU; MMLU doesn’t seem to require [very] long chains of inference? In contrast, I expect takeover plans would. Somewhat related, my memory is that CoT seemed very useful for Theory of Mind (necessary for deception, which seems like an important component of many takeover plans), but the only reference I could find quickly is https://twitter.com/Shima_RM_/status/1651467500356538368.