in CoT finetuning, there is no explicit pressure against encoding co-evolution in the CoT and decoder model, if both are identical or fine tuned in the same pass.
This means that hidden reasoning might be harder? But there is a big disanalogy here. That switching of modes might be the hard part, not the encoding itself.
in CoT finetuning, there is no explicit pressure against encoding co-evolution in the CoT and decoder model, if both are identical or fine tuned in the same pass.
This means that hidden reasoning might be harder? But there is a big disanalogy here. That switching of modes might be the hard part, not the encoding itself.