Filip Sondej comments on Testing which LLM architectures can do hidden serial reasoning

Filip Sondej 17 Dec 2024 15:06 UTC
1 point
0
Also, now that I think of it, it’s different for the model to:
1. represent hidden subtle information which is natural for the task, like in that example you gave with “written by”
2. using that hidden subtext for computation which is unnatural in the given context—to covertly do some other task than just the one it is asked for
While 1. definitely happens, I see no reason why 2. would be learned in pre-training. In principle 2. could be learned in RL fine-tuning, if there’s some incentive to use hidden reasoning. Although I imagine it must be hard.