75% credence: No significant new algorithmic improvements, just some minor things that make it slightly better, or maybe nothing at all.
Nothing at all would feel really surprising to me. I would expect that the programmers working on it do some work on the algorithm. It might be minor things, but I would expect that they only decide on investing more compute into training once they have at least some improvement to show.
75% confidence interval: Gato II will be between 5B and 50B parameters. Context window will be between 2x and 4x as long.
It seems like the main reason why Gato is currently small is that they want to be able to interact at real-world speed with robots. For that goal, it would be easily possible to train one 100B parameter model and then downscale it for applications that need to be that fast.
Nothing at all would feel really surprising to me. I would expect that the programmers working on it do some work on the algorithm. It might be minor things, but I would expect that they only decide on investing more compute into training once they have at least some improvement to show.
It seems like the main reason why Gato is currently small is that they want to be able to interact at real-world speed with robots. For that goal, it would be easily possible to train one 100B parameter model and then downscale it for applications that need to be that fast.