In addition to what cfoster0 said, I’m kinda excited about the next ~2-3 years of cross LLM knowledge transfer, so this seems a differing prediction about the future, which is fun.
My model for why it hasn’t happened already is in part just that most models know the same stuff, because they’re trained on extremely similar enormous swathes of text, so there’s no gain to be had by sticking them together. That would be why more effort goes into LLM / images / video glue than LLM / LLM glue.
But abstractly, a world where LLMs can meaningfully be connected to vision models but not on to other LLMs would be surprising to me. I expect something like training a model on code, and another model on non-code text, and then sticking them together to be possible.
In addition to what cfoster0 said, I’m kinda excited about the next ~2-3 years of cross LLM knowledge transfer, so this seems a differing prediction about the future, which is fun.
My model for why it hasn’t happened already is in part just that most models know the same stuff, because they’re trained on extremely similar enormous swathes of text, so there’s no gain to be had by sticking them together. That would be why more effort goes into LLM / images / video glue than LLM / LLM glue.
But abstractly, a world where LLMs can meaningfully be connected to vision models but not on to other LLMs would be surprising to me. I expect something like training a model on code, and another model on non-code text, and then sticking them together to be possible.