janus comments on GPT-175bee

janus 9 Feb 2023 12:36 UTC
26 points
2
The thought that GPT-3 is a mere 175 bees of brain is extremely disturbing
- Daniel Kokotajlo 9 Feb 2023 22:34 UTC
  10 points
  0
  Parent
  There’s an important timelines crux to do with whether artificial neural nets are more or less parameter-efficient than biological neural nets. There are a bunch of arguments pointing in either direction, such that our prior uncertainty should range over several orders of magnitude in either direction.
  
  Well, seeing what current models are capable of has updated me towards the lower end of that range. Seems like transformers are an OOM or two more efficient than the human brain, on a parameter-to-synapse comparison, at least when you train them for ridiculously long like we currently do.
  
  I’d be interested to hear counterarguments to this take.
  - Steven Byrnes 10 Feb 2023 2:44 UTC
    10 points
    2
    Parent
    If you haven’t already seen it, I wrote about that recently here. Note the warning at the top. I wrote a decent chunk of a follow-up post, but one section will be a lot of work for me and I’m planning to procrastinate it a while. I can share a draft if you’re interested. I’m still on the “100T parameters is super-excessive for human-level AGI” side of the debate, although I think I overstated the case in that post. My take on transformers is something vaguely like “The thing that GPT-3 is doing, it’s already able to do it at or beyond human-level. However, human brains are doing other things too.”
  - jacob_cannell 12 Feb 2023 2:13 UTC
    4 points
    3
    Parent
    Parameter/synapse count is actually not really that important by itself; the first principle component in terms of predictive capability is net training compute. All successful NNs operate in the overcomplete regime, where they have far more circuit capacity than the minimal circuit required to achieve a comparable capability on their training set. This is implied by the various scaling paper laws, it’s also why young human children have an OOM more synapses than adults, why you can prune down a trained network by some OOMs related to it’s overcapacity factor, why there are so many DL papers about the “lottery ticket” hypothesis and related, etc.
    
    net_training_compute = synaptic_compute * training_time
    
    It’s about the total circuit space search volume explored, not the circuit size. You can achieve the same volume and thus capability by training a smaller more compressed circuit for much longer (as in ANNs), or a larger circuit for less time (as in BNNs).
    - the gears to ascension 12 Feb 2023 5:45 UTC
      4 points
      2
      Parent
      Only if you’re overcomplete enough to have a winning ticket at init time. With that caveat, agreed. If you don’t have a winning ticket at init time, you need things like evolutionary search, which can be drastically less efficient depending on the details of the update rule.
- LawrenceC 10 Feb 2023 2:04 UTC
  5 points
  0
  Parent
  Yeah I was tempted to make a human one, for the lols (a human is ~100k bees), ~~but decided even I have better things to do with my life than this~~
  JK I’ll probably do it the next time I get bored
  - LawrenceC 10 Feb 2023 8:16 UTC
    2 points
    0
    Parent
    And… it’s done! Only crashed Figma like 3 times!