It is very unclear to me how difficult these problems are to solve. But I also haven’t seen realistic approaches to tackle them.
That sounds be me more like lack of interest in research than lack of attempts to solve the problems.
AutoGPT frameworks provide LLMs a way to have system II thinking. With 100k token as a context window, there’s a lot that can be done as far as such agents go. It’s work to create good training data but it’s doable provided there the capital investment.
As far as multimodel models go, DeepMinds GATO does that. It’s just not as performant as being a pure LLM
I know these approaches and they don’t work. Maybe they will start working at some point, but to me very unclear when and why that should happen. All approaches that use recurrence based on token-output are fundamentally impoverished compared to real recurrence.
About multi-modality:
I expect these limitations to largely vanish as models are scaled up and trained end-to-end on a large variety of modalities.
Yes, maybe Gemini will be able to really hear and see.
That sounds be me more like lack of interest in research than lack of attempts to solve the problems.
AutoGPT frameworks provide LLMs a way to have system II thinking. With 100k token as a context window, there’s a lot that can be done as far as such agents go. It’s work to create good training data but it’s doable provided there the capital investment.
As far as multimodel models go, DeepMinds GATO does that. It’s just not as performant as being a pure LLM
I know these approaches and they don’t work. Maybe they will start working at some point, but to me very unclear when and why that should happen. All approaches that use recurrence based on token-output are fundamentally impoverished compared to real recurrence.
About multi-modality:
Yes, maybe Gemini will be able to really hear and see.