There’s an important timelines crux to do with whether artificial neural nets are more or less parameter-efficient than biological neural nets. There are a bunch of arguments pointing in either direction, such that our prior uncertainty should range over several orders of magnitude in either direction.
Well, seeing what current models are capable of has updated me towards the lower end of that range. Seems like transformers are an OOM or two more efficient than the human brain, on a parameter-to-synapse comparison, at least when you train them for ridiculously long like we currently do.
I’d be interested to hear counterarguments to this take.
If you haven’t already seen it, I wrote about that recently here. Note the warning at the top. I wrote a decent chunk of a follow-up post, but one section will be a lot of work for me and I’m planning to procrastinate it a while. I can share a draft if you’re interested. I’m still on the “100T parameters is super-excessive for human-level AGI” side of the debate, although I think I overstated the case in that post. My take on transformers is something vaguely like “The thing that GPT-3 is doing, it’s already able to do it at or beyond human-level. However, human brains are doing other things too.”
Parameter/synapse count is actually not really that important by itself; the first principle component in terms of predictive capability is net training compute. All successful NNs operate in the overcomplete regime, where they have far more circuit capacity than the minimal circuit required to achieve a comparable capability on their training set. This is implied by the various scaling paper laws, it’s also why young human children have an OOM more synapses than adults, why you can prune down a trained network by some OOMs related to it’s overcapacity factor, why there are so many DL papers about the “lottery ticket” hypothesis and related, etc.
It’s about the total circuit space search volume explored, not the circuit size. You can achieve the same volume and thus capability by training a smaller more compressed circuit for much longer (as in ANNs), or a larger circuit for less time (as in BNNs).
Only if you’re overcomplete enough to have a winning ticket at init time. With that caveat, agreed. If you don’t have a winning ticket at init time, you need things like evolutionary search, which can be drastically less efficient depending on the details of the update rule.
The thought that GPT-3 is a mere 175 bees of brain is extremely disturbing
There’s an important timelines crux to do with whether artificial neural nets are more or less parameter-efficient than biological neural nets. There are a bunch of arguments pointing in either direction, such that our prior uncertainty should range over several orders of magnitude in either direction.
Well, seeing what current models are capable of has updated me towards the lower end of that range. Seems like transformers are an OOM or two more efficient than the human brain, on a parameter-to-synapse comparison, at least when you train them for ridiculously long like we currently do.
I’d be interested to hear counterarguments to this take.
If you haven’t already seen it, I wrote about that recently here. Note the warning at the top. I wrote a decent chunk of a follow-up post, but one section will be a lot of work for me and I’m planning to procrastinate it a while. I can share a draft if you’re interested. I’m still on the “100T parameters is super-excessive for human-level AGI” side of the debate, although I think I overstated the case in that post. My take on transformers is something vaguely like “The thing that GPT-3 is doing, it’s already able to do it at or beyond human-level. However, human brains are doing other things too.”
Parameter/synapse count is actually not really that important by itself; the first principle component in terms of predictive capability is net training compute. All successful NNs operate in the overcomplete regime, where they have far more circuit capacity than the minimal circuit required to achieve a comparable capability on their training set. This is implied by the various scaling paper laws, it’s also why young human children have an OOM more synapses than adults, why you can prune down a trained network by some OOMs related to it’s overcapacity factor, why there are so many DL papers about the “lottery ticket” hypothesis and related, etc.
net_training_compute = synaptic_compute * training_time
It’s about the total circuit space search volume explored, not the circuit size. You can achieve the same volume and thus capability by training a smaller more compressed circuit for much longer (as in ANNs), or a larger circuit for less time (as in BNNs).
Only if you’re overcomplete enough to have a winning ticket at init time. With that caveat, agreed. If you don’t have a winning ticket at init time, you need things like evolutionary search, which can be drastically less efficient depending on the details of the update rule.
Yeah I was tempted to make a human one, for the lols (a human is ~100k bees),
but decided even I have better things to do with my life than thisJK I’ll probably do it the next time I get bored
And… it’s done! Only crashed Figma like 3 times!