Artyom Karpov comments on Proposal for Inducing Steganography in LMs

Artyom Karpov 30 Aug 2024 2:46 UTC
1 point
0
Thank you for posting this. Why do you think this is a steganography evidence in LLMs? Those steg tokens would be unrelated to the question being asked and as such be out of usual distribution and easily noticeable by an eavesdropper. Yet, this is a good evidence for hidden reasoning inside CoT. I think this experiment was done in https://arxiv.org/abs/2404.15758, Pfau, Merrill, and Bowman, ‘Let’s Think Dot by Dot’.