Yair Halberstadt comments on “A Generalist Agent”: New DeepMind Publication

Yair Halberstadt 12 May 2022 16:34 UTC
1 point
Looking at the image captioning and text prediction responses, it doesn’t appear to be very good at either...
- Daniel Kokotajlo 12 May 2022 16:35 UTC
  18 points
  Parent
  It’s smaller than GPT-2. Only 1.2B params.
  - gwern 12 May 2022 16:40 UTC
    15 points
    Parent
    
    We focus our training at the operating point of model scale that allows real-time control of real-world robots, currently around 1.2B parameters in the case of Gato. As hardware and model architectures improve, this operating point will naturally increase the feasible model size, pushing generalist models higher up the scaling law curve. For simplicity Gato was trained offline in a purely supervised manner; however, in principle, there is no reason it could not also be trained with either offline or online reinforcement learning (RL).
    
    And there is, of course, absolutely no reason to think that it wouldn’t get as good as text/image models like Flamingo or the new ULM2 if it was trained & scaled as much as they were; the problem is that you can’t run such large dense models at the necessary low latency for realtime robotics… Perhaps finally a genuine application for MoEs to enable plugging in very large unimodal/multimodal models.
    - p.b. 12 May 2022 17:46 UTC
      3 points
      Parent
      A principled solution would probably involve running different parts of the model at different frequencies. But you could also just scale breadth and see how far it goes. The human brain is not very deep—just recursive.
      - Pattern 12 May 2022 18:46 UTC
        1 point
        Parent
        I wouldn’t have connected breadth and recursion. (I’d have just thought, well, self-calling.)
    - Aryeh Englander 12 May 2022 21:36 UTC
      2 points
      Parent
      A friend pointed out on Facebook that Gato uses TPU-v3′s. Not sure why—I thought Google already had v4′s available for internal use a while ago? In any case, the TPU-v4 might potentially help a lot for the latency issue.
      - Qumeric 13 May 2022 7:26 UTC
        6 points
        Parent
        Two main options:
        * It was trained e.g. 1 year ago but published only now
        * All TPU-v4 very busy with something even more important
      - lennart 13 May 2022 11:01 UTC
        4 points
        Parent
        They trained it on TPUv3s, however, the robot inference was run on a Geforce RTX 3090 (see section G).
        TPUs are mostly designed for data centers and are not really usable for on-device inference.
  - Maxime Riché 13 May 2022 8:39 UTC
    3 points
    Parent
    Indeed but to slightly counter balance this, at the same time, it looks like it was trained on ~500B tokens (while ~300B were used for GPT-3 and for GPT-2 something like ~50B).
    - gwern 13 May 2022 20:22 UTC
      11 points
      Parent
      Most of those tokens were spent on the RL tasks, which were 85% of the corpus. Looking at the table 1a/1b which, the pure text modeling tasks looks like they were 10% weight with the other 5% being the image caption datasets*; so if it did 5 x 1e11 tokens total (Figure 9), then presumably it only saw a tenth of that as actual pure text comparable to GPT-2, or 50b tokens. It’s also a small model so it is less sample-efficient and will get less than n billion tokens’ worth if you are mentally working back from “well, GPT-3 used x billion tokens”).
      
      Considering further that it was not necessarily trained to convergence on the language modeling task (actually, come to think of it, how even did they decide when to stop training? they certainly didn’t derive scaling laws on the overall task mix & train Gato in a compute-optimal fashion… was Gato converged on any tasks?), and remembering just how dumb GPT-2 is by contemporary standards (which have been moving the goalposts at supersonic speed), the sample dialogues don’t look all that surprisingly dumb to me given its size & token count & training setup.
      
      * image grounding is great and all that, but I don’t expect it to be all that useful for knowing ‘Marseilles is not the capital of France’.