The amount of compute required to emulate the human brain depends on the level of detail we want to emulate.
Back in 2008, Sandberg and Bostrom proposed the following values:
Level of emulation detail | FLOPS required to run the brain emulation in real-time |
Analog network population model | 10^15 |
Spiking neural network | 10^18 |
Electrophysiology | 10^22 |
Metabolome | 10^25 |
Proteome | 10^26 |
States of protein complexes | 10^27 |
Distribution of protein complexes | 10^30 |
Stochastic behavior of single molecules | 10^43 |
Today I’ve encountered an interesting piece of data on GPT-3 (source):
GPT-3 required ~10^15 FLOPS for inference.
It required ~10^23 FLOPS to train it [Note: the training took some months. It would require ~10^30 FLOPS to train it from zero in one second]
As far as I know, GPT-3 was the first AI with the range and the quality of cognitive abilities comparable to the human brain (although still far from reaching the human level on many tasks).
Coincidentally(?), GPT-3 requires 10^15 − 10^30 FLOPS to operate at the brain’s speed, which is roughly the same amount of compute necessary to run a decent emulation of the human brain.
The range of possible compute is almost infinite (e.g. 10^100 FLOPS and beyond). Yet both intelligences are in the same relatively narrow range of 10^15 − 10^30 (assuming the human brain emulation doesn’t need to be nano-level detailed).
Is it a coincidence, or is there something deeper going on here?
This could be important for both understanding the human brain, and for predicting how far we are from the true AGI.
GPT-3 is about 2e11 parameters and uses about 4 flops per parameter per token, so about 1e12 flops per token.
If a human writes at 1 token per second, then you should be comparing 1e12 flops to the cost per second. I think you are implicitly comparing to the cost for a ~1000 token context?
I think 1e14 to 1e15 flops is a plausible estimate for the productive computation done by a human brain in a second, which is about 2-3 orders of magnitude beyond GPT-3.
I think this is not really a coincidence. GPT-3 is notable because it’s starting to exhibit human-like abilities. It’s not super surprising that should happen around human levels of compute, and I would personally expect the trend to continue as we scale up towards human level compute and continue improving deep learning efficiency. (I gave this about 50% probability in 2017 before seeing GPT-2, but I’ve updated significantly in favor over the last 6 years.)
More generally, I think the numbers in your post are wrong and the discussion is somewhat confused. 1e15 to 1e30 is not a narrow interval, I don’t think you should compare training costs to inference costs, 1e30 is not the training cost of GPT-3, you should probably compare to brain compute estimates like this one rather than brain emulation estimates...
But I think it’s reasonable to step back and say that compared to what you might have expected, biological anchors have been a pretty good guide to ML progress. They are losing usefulness now since at best they have like 10 years of resolution and eyeballing is getting easier and easier as we approach transformative AI. But I still find them helpful as an additional independent check to go along with eyeballing, economic extrapolations, etc. (And until recently I think they were probably the most common way people arrived at in-retrospect-reasonable-looking timeline estimates.)
Hi. Can you provide a citable reference for the “4 flops per parameter per token”? It’s for a research paper in the foundations of quantum physics. Thanks. (Howard Wiseman.)
Both the human brain cost estimates and the GPT3 cost estimates are incredibly noisy/mistaken and I wouldn’t take them too seriously.
To start, 15 orders of magnitude is not a narrow range at all!
For reference, the speed of light is within 8 orders of magnitude of a car, and an Elephant weighs within 6 OOM of a chicken — so this uncertainty is really big.
To be fair, I’ll note that the 10^30 estimate for GPT-3 is clearly an overestimate, the 3 x 10^23 floating point operations is the total compute used to train GPT-3, not it’s per-second usage (the unit is floating point operations, not floating point operations per second. Yes, the notation is confusing.)
I also think that the higher values of 10^27 and 10^30 seem pretty infeasible for brain simulation. But the lower numbers still seems feasible.
Another issue with your estimate is that it’s very bizarre to compare the total cost of training GPT vs the instantaneous operating cost of a human. Surely we want to compare like to like, and compare either instantaneous compute usage, or cumulative lifetime usage (which would multiply the human number by around 9 orders of magnitude).
A final confusion here is how to convert a forward pass to human thinking time. There’s some arguments that a forward pass is way more — you can’t read thousands of characters at once , for example — and some that it’s less — you can generate way more than literally one token at a time, and also do more impressive cognition.
So I think a better estimate for GPT3 cognition in real human equivalent time is something like 10^12 − 10^17 flops, while humans are 10^15 − 10^26 or whatever. This looks way less coincidental!
Most importantly, I think the obvious explanation applies here — to do the impressive kind of human seeming cognition GPT3 and it’s LLM brethren can do, using relatively straightforward methods like neural networks, takes a non-negligible fraction of the human brain’s compute.