We focus our training at the operating point of model scale that allows real-time control of real-world robots, currently around 1.2B parameters in the case of Gato. As hardware and model architectures improve, this operating point will naturally increase the feasible model size, pushing generalist models higher up the scaling law curve. For simplicity Gato was trained offline in a purely supervised manner; however, in principle, there is no reason it could not also be trained with either offline or online reinforcement learning (RL).
And there is, of course, absolutely no reason to think that it wouldn’t get as good as text/image models like Flamingo or the new ULM2 if it was trained & scaled as much as they were; the problem is that you can’t run such large dense models at the necessary low latency for realtime robotics… Perhaps finally a genuine application for MoEs to enable plugging in very large unimodal/multimodal models.
A principled solution would probably involve running different parts of the model at different frequencies. But you could also just scale breadth and see how far it goes. The human brain is not very deep—just recursive.
A friend pointed out on Facebook that Gato uses TPU-v3′s. Not sure why—I thought Google already had v4′s available for internal use a while ago? In any case, the TPU-v4 might potentially help a lot for the latency issue.
And there is, of course, absolutely no reason to think that it wouldn’t get as good as text/image models like Flamingo or the new ULM2 if it was trained & scaled as much as they were; the problem is that you can’t run such large dense models at the necessary low latency for realtime robotics… Perhaps finally a genuine application for MoEs to enable plugging in very large unimodal/multimodal models.
A principled solution would probably involve running different parts of the model at different frequencies. But you could also just scale breadth and see how far it goes. The human brain is not very deep—just recursive.
I wouldn’t have connected breadth and recursion. (I’d have just thought, well, self-calling.)
A friend pointed out on Facebook that Gato uses TPU-v3′s. Not sure why—I thought Google already had v4′s available for internal use a while ago? In any case, the TPU-v4 might potentially help a lot for the latency issue.
Two main options:
* It was trained e.g. 1 year ago but published only now
* All TPU-v4 very busy with something even more important
They trained it on TPUv3s, however, the robot inference was run on a Geforce RTX 3090 (see section G).
TPUs are mostly designed for data centers and are not really usable for on-device inference.