I agree with this analysis. I mean, I’m not certain further optimization will erode the interpretability of the generated CoT, its possible the fact its pretrained to use human natural language pushes it in a stable equilibrium, but I don’t think so, there are ways the CoT can become less interpretable in a step-wise fashion.
But this is the way its going, seems inevitable to me. Just scaling up models and then training them on English language internet text, is clearly less efficient (from a “build AGI” perspective, and from a profit-perspective) than training them to do the specific tasks that the users of the technology want. So thats the way its going.
And once you’re training the models this way, the tether between human-understandable concepts and the CoT will be completely destroyed. If they stay together, it will just be because its kind of a stable initial condition.
I agree with this analysis. I mean, I’m not certain further optimization will erode the interpretability of the generated CoT, its possible the fact its pretrained to use human natural language pushes it in a stable equilibrium, but I don’t think so, there are ways the CoT can become less interpretable in a step-wise fashion.
But this is the way its going, seems inevitable to me. Just scaling up models and then training them on English language internet text, is clearly less efficient (from a “build AGI” perspective, and from a profit-perspective) than training them to do the specific tasks that the users of the technology want. So thats the way its going.
And once you’re training the models this way, the tether between human-understandable concepts and the CoT will be completely destroyed. If they stay together, it will just be because its kind of a stable initial condition.