I should also make a prediction for the nearer version of GATO to actually answer the questions from the post. So if a new version of GATO appears in next 4 months, I predict:
80% confidence interval: Gato will have 50B-200B params. Context window will be 2-4x larger(similar to GPT-3)
50%: No major algorithmic improvements, RL or memory. Maybe use of perceiver. Likely some new tokenizers. The improvements would come more from new data and scale.
80%: More text,images,video,audio. More games and new kinds of data. E.g. special prompting to do something in a game, draw a picture, perform some action.
75%: Visible transfer learning. Gato trained on more tasks and pre-trained on video would perform better in most but not all games, compared to a model with similar size trained just on the particular task. Language model would be able to descripe shape of objects better after being trained together with images/video/audio.
70%: Chain of thought reasoning would perform better compared to a LLM of similar size. The improvement won’t be huge though and I wouldn’t expect it to gain some suprisingly sophisticated new LLM capabilities.
80%: It won’t be able to play new Atari games similarly to humans, but there would be a visible progress—the actions would be less random and directed towards the goal of the game. With sophisticated prompting, e.g. “Describe first what the goal of this game is, how to play it, what is the best strategy”, significant improvements would be seen, but still sub-human.
I should also make a prediction for the nearer version of GATO to actually answer the questions from the post. So if a new version of GATO appears in next 4 months, I predict:
80% confidence interval: Gato will have 50B-200B params. Context window will be 2-4x larger(similar to GPT-3)
50%: No major algorithmic improvements, RL or memory. Maybe use of perceiver. Likely some new tokenizers. The improvements would come more from new data and scale.
80%: More text,images,video,audio. More games and new kinds of data. E.g. special prompting to do something in a game, draw a picture, perform some action.
75%: Visible transfer learning. Gato trained on more tasks and pre-trained on video would perform better in most but not all games, compared to a model with similar size trained just on the particular task. Language model would be able to descripe shape of objects better after being trained together with images/video/audio.
70%: Chain of thought reasoning would perform better compared to a LLM of similar size. The improvement won’t be huge though and I wouldn’t expect it to gain some suprisingly sophisticated new LLM capabilities.
80%: It won’t be able to play new Atari games similarly to humans, but there would be a visible progress—the actions would be less random and directed towards the goal of the game. With sophisticated prompting, e.g. “Describe first what the goal of this game is, how to play it, what is the best strategy”, significant improvements would be seen, but still sub-human.