This looks exciting! I wonder about the proposed training setup: If one model produces the thoughts, and another one takes those as input to the prompts, are we actually learning anything about the internal state of either model? What is the advantage (beyond scalability) of this training setup vs just using the second model to produce continuations conditional on thoughts?
This looks exciting! I wonder about the proposed training setup: If one model produces the thoughts, and another one takes those as input to the prompts, are we actually learning anything about the internal state of either model? What is the advantage (beyond scalability) of this training setup vs just using the second model to produce continuations conditional on thoughts?