calef comments on Christiano, Cotra, and Yudkowsky on AI progress

calef 26 Nov 2021 19:41 UTC
12 points
0
This is based on:
1. The Q&A you mention
2. GPT-3 not being trained on even one pass of its training dataset
3. “Use way more compute” achieving outsized gains by training longer than by most other architectural modifications for a fixed model size (while you’re correct that bigger model = faster training, you’re trading off against ease of deployment, and models much bigger than GPT-3 become increasingly difficult to serve at prod. Plus, we know it’s about the same size, from the Q&A)
4. Some experience with undertrained enormous language models underperforming relative to expectation
This is not to say that GPT-4 wont have architectural changes. Sam mentioned a longer context at the least. But these sorts of architectural changes probably qualify as “small” in the parlance of the above conversation.
- Lukas Finnveden 26 Nov 2021 20:04 UTC
  5 points
  0
  Parent
  To be clear: Do you remember Sam Altman saying that “they’re simply training a GPT-3-variant for significantly longer”, or is that an inference from ~”it will use a lot more compute” and ~”it will not be much bigger”?
  Because if you remember him saying that, then that contradicts my memory (and, uh, the notes that people took that I remember reading), and I’m confused.
  While if it’s an inference: sure, that’s a non-crazy guess, and I take your point that smaller models are easier to deploy. I just want it to be flagged as a claimed deduction, not as a remembered statement.
  (And I maintain my impression that something more is going on; especially since I remember Sam generally talking about how models might use more test-time compute in the future, and be able to think for longer on harder questions.)
  - calef 26 Nov 2021 20:10 UTC
    4 points
    0
    Parent
    Honestly, at this point, I don’t remember if it’s inferred or primary-sourced. Edited the above for clarity.