gwern comments on Inference cost limits the impact of ever larger models

gwern 5 Nov 2021 14:22 UTC
4 points
0

Width-wise parallelism could help but its communication cost scales unfavorably. It grows quadratically as we grow the NN’s width, and then quadratically again when we try to reduce latency by reducing the number of neurons per GPU.

Incidentally, the latency cost of width vs depth is something I’ve thought might explain why the brain/body allometric scaling laws are so unfavorable and what all that expensive brain matter does given that our tiny puny little ANNs seem capable of so much: everything with a meaningful biological brain, from ants to elephants, suffers from hard (fatal) latency requirements. You are simply not allowed by Nature or Darwin to take 5 seconds to compute how to move your legs.* (Why was Gato 1 so small and so unimpressive in many ways? Well, they kept it small because they wanted it to run in realtime for a real robot. A much wider Transformer could’ve still met the deadline… but cost a lot more parameters and training than usual by going off the optimal scaling curves.) It does not matter how many watts or neurons you save by using a deep skinny network, if after 10 layers have fired with another 100 to go to compute the next action to take, you’ve been eaten by a stupider but faster-thinking predator.

So a biological brain might be forced to be deep into an unfavorable point on width vs depth—which might be extremely expensive—in order to meet its subset of robotics-related deadlines, as it were.

* With a striking counterexample, in both tininess of brain and largeness of latency, being Portia. What is particularly striking to me is not that it is so intelligent while being so tiny, but that this seems to be directly due to its particular ecological niche: there are very few creatures out there who need extremely flexible intelligent behavior but also are allowed to have minutes or hours to plan many of its actions… but Portia is one of them, as it is a stealthy predator attacking static prey. So Portia spiders are allowed to do things like spend hours circumnavigating a web to strike its prey spider from the right direction or gradually test out mimicry until it finds the right cue to trick its prey spider. So it’s fascinating to see that in this highly unusual niche, it is possible to have a tiny biological brain execute extremely slow but intelligent strategies, and it suggests that if latency were not a problem, biological brains could be far more intelligent and we would not need to see such architecturally-huge biological brains to reach human-level performance, and then we would no longer have any paradox of why highly-optimized human brains seem to need so many parameters to do the same thing as tiny ANNs.