In general I’m not that interested in these sorts of “generic agents” that can do all the things with one neural net and don’t think they affect the relevant timelines very much; it seems like it will be far more economically useful to have separate neural nets doing each of the things and using each other as tools to accomplish particular tasks and so that’s what I expect to see.
Aren’t you worried about agents that can leverage extremely complex knowledge of the world (like Flamingo has) that they gained via text, picture, video, etc inputs, on a robotic controller? Think of an RL agent that can learn how to play Montezuma’s Revenge extremely quickly, because it consumed so much internet data that it knows what a “key” and “rope” are, and that these in-game objects are analogous to those images it saw in pretraining. Something like that getting a malicious command in real life on a physical robot seems terrifying—it would be able to form extremely complex plans in order to achieve a malicious goal, given its environment—and at least from what I can tell from the Gato paper, the only missing ingredient at this point might be “more parameters/TPUs”
Aren’t you worried about agents that can leverage extremely complex knowledge of the world (like Flamingo has) that they gained via text, picture, video, etc inputs, on a robotic controller? Think of an RL agent that can learn how to play Montezuma’s Revenge extremely quickly, because it consumed so much internet data that it knows what a “key” and “rope” are, and that these in-game objects are analogous to those images it saw in pretraining. Something like that getting a malicious command in real life on a physical robot seems terrifying—it would be able to form extremely complex plans in order to achieve a malicious goal, given its environment—and at least from what I can tell from the Gato paper, the only missing ingredient at this point might be “more parameters/TPUs”