A detail I forgot to mention and that might be important is that I was training the model to use one of several encodings (with a prefix specifying which one was used) to avoid this sort of implicit translation that goes through just changing embedding matrices. But that would have been a good sanity check to run to make sure that the model would learn to do the simple embedding matrix change when this is an optimal strategy (I would be very surprised if SGD was not able to find this simple transformation, what I claim is tricky is learning a new weird encoding while maintaining the other capabilities, though I am confused enough here that I don’t claim to make very accurate prediction in this domain).
in CoT finetuning, there is no explicit pressure against encoding co-evolution in the CoT and decoder model, if both are identical or fine tuned in the same pass.
This means that hidden reasoning might be harder? But there is a big disanalogy here. That switching of modes might be the hard part, not the encoding itself.
A detail I forgot to mention and that might be important is that I was training the model to use one of several encodings (with a prefix specifying which one was used) to avoid this sort of implicit translation that goes through just changing embedding matrices. But that would have been a good sanity check to run to make sure that the model would learn to do the simple embedding matrix change when this is an optimal strategy (I would be very surprised if SGD was not able to find this simple transformation, what I claim is tricky is learning a new weird encoding while maintaining the other capabilities, though I am confused enough here that I don’t claim to make very accurate prediction in this domain).
The encoding was a lookup table.
in CoT finetuning, there is no explicit pressure against encoding co-evolution in the CoT and decoder model, if both are identical or fine tuned in the same pass.
This means that hidden reasoning might be harder? But there is a big disanalogy here. That switching of modes might be the hard part, not the encoding itself.