Tapatakt comments on GPT-4

Tapatakt 14 Mar 2023 17:42 UTC
43 points
30
Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
Wow, that’s good, right?
- Rafael Harth 14 Mar 2023 18:19 UTC
  15 points
  11
  Parent
  Yes. How good is up for debate, but it’s definitely good.
- Muyyd 14 Mar 2023 18:00 UTC
  7 points
  3
  Parent
  But how good it can be, realistically? I will be so so much surprised if all this details wont be leaked in next week. May be they will try to make several false leaks to muddle things a bit.
  - Gerald Monroe 14 Mar 2023 19:16 UTC
    6 points
    1
    Parent
    It could leak when OAI employees take an offer to work at another lab.
  - Fer32dwt34r3dfsz 14 Mar 2023 18:55 UTC
    1 point
    0
    Parent
    Strong agreement here. I find it unlikely that most of these details will still be concealed after 3 months or so, as it seems unlikely, combined, that no one will be able to infer some of these details or that there will be no leak.
    
    Regarding the original thread, I do agree that OpenAI’s move to conceal the details of the model is a Good Thing, as this step is risk-reducing and creates / furthers a norm for safety in AI development that might be adopted elsewhere. Nonetheless, the information being concealed seems likely to become known soon, in my mind, for the general reasons I outlined in the previous paragraph.
    - gwern 14 Mar 2023 19:05 UTC
      60 points
      19
      Parent
      You can definitely infer quite a bit from the paper and authors by section, but there is a big difference between a plausible informed guess, and knowing. For most purposes, weak inferences are not too useful. ‘Oh, this is Chinchilla, this is VQ-VAE, this is Scaling Transformer...’ For example, the predicting-scaling part (and Sam Altman singling out the author for praise) is clearly the zero-shot hyperparameter work, but that’s not terribly helpful, because the whole point of scaling laws (and the mu work in particular) is that if you don’t get it right, you’ll fall off the optimal scaling curves badly if you try to scale up 10,000x to GPT-4 (never mind the GPT-5 OA has in progress), and you probably can’t just apply the papers blindly—you need to reinvent whatever he invented since and accumulate the same data, with no guarantee you’ll do it. Not a great premise on which to spend $1b or so. If you’re a hyperscaler not already committed to the AI arms race, this is not enough information, or reliable enough, to move the needle on your major strategic decision. Whereas if they had listed exact formulas or results (especially the negative results), it may be enough of a roadmap to kickstart another competitor a few months or years earlier.
      - Daniel Murfet 15 Mar 2023 11:47 UTC
        9 points
        2
        Parent
        By the zero-shot hyperparameter work do you mean https://arxiv.org/abs/2203.03466 “Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer”? I’ve been sceptical of NTK-based theory, seems I should update.
      - M. Y. Zuo 16 Mar 2023 2:09 UTC
        2 points
        1
        Parent
        (never mind the GPT-5 OA has in progress)
        Is there even enough training data for GPT-5? (Assuming it’s goal is to 50x or 100x GPT-4)
        Teerth Aloke 16 Mar 2023 3:00 UTC
        1 point
        0
        Parent
        Not public data, at least.
- talelore 15 Mar 2023 5:56 UTC
  3 points
  1
  Parent
  Yep, but of course the common opinion on Hacker News is that this is horrible.
- Gabe M 14 Mar 2023 17:50 UTC
  0 points
  2
  Parent
  something something silver linings...