aogara comments on Two-year update on my personal AI timelines

aogara 3 Aug 2022 3:18 UTC
26 points
4
My report estimates that the amount of training data required to train a model with N parameters scales as N^0.8, based significantly on results from Kaplan et al 2020. In 2022, the Chinchilla scaling result (Hoffmann et al 2022) showed that instead the amount of data should scale as N.
Are you concerned that pretrained language models might hit data constraints before TAI? Nostalgebraist estimates that there are roughly 3.2T tokens available publicly for language model pretraining. This estimate misses important potential data sources such as transcripts from audio and video and private text conversations and email. But the BioAnchors report estimates that the median transformative model will require a median of 22T data points, nearly an order of magnitude higher than this estimate.
The BioAnchors estimate was also based on older scaling laws that placed a lower priority on data relative to compute. With the new Chinchilla scaling laws, more data would be required for compute-optimal training. Of course, training runs don’t need to be compute-optimal: You can get away with using more compute and less data if you’re constrained by data, even if it’s going to cost more. And text isn’t the only data a transformative model could use: audio, video, and RLHF on diverse tasks all seem like good candidates.
Does the limited available public text data affect your views of how likely GPT-N is to be transformative? Are there any considerations overlooked here, or questions that could use a more thorough analysis? Curious about anybody else’s opinions, and thanks for sharing the update, I think it’s quite persuasive.
What links here?
- aogara's comment on chinchilla’s wild implications by nostalgebraist (3 Aug 2022 3:31 UTC; 5 points)
- porby 3 Aug 2022 3:59 UTC
  22 points
  13
  Parent
  I suspect Chinchilla’s implied data requirements aren’t going to be that much of a blocker for capability gain. It is an important result, but it’s primarily about the behavior of current backpropped transformer based LLMs.
  The data inefficiency of many architectures was known before Chinchilla, but the industry worked around it because it wasn’t yet a bottleneck. After Chinchilla, it has become one of the largest architectural optimization targets. Given the increase in focus and the relative infancy of the research, I would guess the next two years will see the picking of some very juicy low hanging fruit. There are a lot of options floating nearby in conceptspace and there is a lot of room to grow; I’d be surprised if data limitations still feel as salient in 2025.
- Lone Pine 3 Aug 2022 13:03 UTC
  2 points
  0
  Parent
  What’s RLHF?
  - Tom Lieberum 3 Aug 2022 13:05 UTC
    −1 points
    −3
    Parent
    Reward Learning from Human Feedback
    - Lukas Finnveden 3 Aug 2022 18:59 UTC
      10 points
      5
      Parent
      Reinforcement* learning from human feedback
      - Tom Lieberum 3 Aug 2022 21:28 UTC
        0 points
        0
        Parent
        Well I thought about that but I wasn’t sure whether reinforcement learning from human feedback wouldn’t be just a strict subset of reward learning from human feedback. If reinforcement is indeed the strict definition then I concede but I dont think it makes sense.
        Lukas Finnveden 3 Aug 2022 21:50 UTC
        7 points
        4
        Parent
        The acronym is definitely used for reinforcement learning. [“RLHF” “reinforcement learning from human feedback”] gets 564 hits on google, [“RLHF” “reward learning from human feedback”] gets 0.
        Tom Lieberum 8 Aug 2022 11:55 UTC
        2 points
        0
        Parent
        Thanks for verifying! I retract my comment.
        Tom Lieberum 3 Aug 2022 21:33 UTC
        −2 points
        −2
        Parent
        I think historically reinforcement has been used more in that particular constellation (see eg deep RL from HP paper) but as I noted I find reward learning more apt as it points to the hard thing being the reward learning, i.e. distilling human feedback into an objective, rather than the optimization of any given reward function (which technically need not involve reinforcement learning)