It looks to me liike it’s a matter of keeping the pressures for a faithful chain of thought larger than the pressures to create steganography or jargon/unique language of thought. Methods that penalize jargon will drive toward steganography.
I’ve seen enthusiasts remark that training it to do more cognition in a single forward pass is a good thing; for efficiency, it is.
As in the other comment thread, training for a short CoT drives toward jargon/unique language. But that’s balanced by using an independent judge of validity for process supervision; as long as the judge is a different model, it won’t understand any jargon and should judge the step as invalid. Explicitly making that part of the criteria would really help.
If I understand correctly, steganography in existing models is quite limited; it’s more a matter of using phrasing as a cue for likely continuations than any real attempt to hide cognition. That’s because there’s no real pressure in the training process to create steganography—yet.
Which pressure wins out seems very much up in the air right now.
Excellent post!
It looks to me liike it’s a matter of keeping the pressures for a faithful chain of thought larger than the pressures to create steganography or jargon/unique language of thought. Methods that penalize jargon will drive toward steganography.
I’ve seen enthusiasts remark that training it to do more cognition in a single forward pass is a good thing; for efficiency, it is.
As in the other comment thread, training for a short CoT drives toward jargon/unique language. But that’s balanced by using an independent judge of validity for process supervision; as long as the judge is a different model, it won’t understand any jargon and should judge the step as invalid. Explicitly making that part of the criteria would really help.
If I understand correctly, steganography in existing models is quite limited; it’s more a matter of using phrasing as a cue for likely continuations than any real attempt to hide cognition. That’s because there’s no real pressure in the training process to create steganography—yet.
Which pressure wins out seems very much up in the air right now.