My expectation is that people will turn SSL models into agentic reasoners. I think this will happen through refinements to “chain of thought”-style reasoning approaches. See here. Such approaches absolutely do let LLMs “mull things over” to a limited degree, even with current very crude methods to do chain of thought with current LLMs. I also think future RL advancements will be more easily used to get better chain of thought reasoners, rather than accelerating a new approach to the SOTA.
My expectation is that people will turn SSL models into agentic reasoners. I think this will happen through refinements to “chain of thought”-style reasoning approaches. See here. Such approaches absolutely do let LLMs “mull things over” to a limited degree, even with current very crude methods to do chain of thought with current LLMs. I also think future RL advancements will be more easily used to get better chain of thought reasoners, rather than accelerating a new approach to the SOTA.