Both the human brain cost estimates and the GPT3 cost estimates are incredibly noisy/mistaken and I wouldn’t take them too seriously.
To start, 15 orders of magnitude is not a narrow range at all!
For reference, the speed of light is within 8 orders of magnitude of a car, and an Elephant weighs within 6 OOM of a chicken — so this uncertainty is really big.
To be fair, I’ll note that the 10^30 estimate for GPT-3 is clearly an overestimate, the 3 x 10^23 floating point operations is the total compute used to train GPT-3, not it’s per-second usage (the unit is floating point operations, not floating point operations per second. Yes, the notation is confusing.)
I also think that the higher values of 10^27 and 10^30 seem pretty infeasible for brain simulation. But the lower numbers still seems feasible.
Another issue with your estimate is that it’s very bizarre to compare the total cost of training GPT vs the instantaneous operating cost of a human. Surely we want to compare like to like, and compare either instantaneous compute usage, or cumulative lifetime usage (which would multiply the human number by around 9 orders of magnitude).
A final confusion here is how to convert a forward pass to human thinking time. There’s some arguments that a forward pass is way more — you can’t read thousands of characters at once , for example — and some that it’s less — you can generate way more than literally one token at a time, and also do more impressive cognition.
So I think a better estimate for GPT3 cognition in real human equivalent time is something like 10^12 − 10^17 flops, while humans are 10^15 − 10^26 or whatever. This looks way less coincidental!
Most importantly, I think the obvious explanation applies here — to do the impressive kind of human seeming cognition GPT3 and it’s LLM brethren can do, using relatively straightforward methods like neural networks, takes a non-negligible fraction of the human brain’s compute.
Both the human brain cost estimates and the GPT3 cost estimates are incredibly noisy/mistaken and I wouldn’t take them too seriously.
To start, 15 orders of magnitude is not a narrow range at all!
For reference, the speed of light is within 8 orders of magnitude of a car, and an Elephant weighs within 6 OOM of a chicken — so this uncertainty is really big.
To be fair, I’ll note that the 10^30 estimate for GPT-3 is clearly an overestimate, the 3 x 10^23 floating point operations is the total compute used to train GPT-3, not it’s per-second usage (the unit is floating point operations, not floating point operations per second. Yes, the notation is confusing.)
I also think that the higher values of 10^27 and 10^30 seem pretty infeasible for brain simulation. But the lower numbers still seems feasible.
Another issue with your estimate is that it’s very bizarre to compare the total cost of training GPT vs the instantaneous operating cost of a human. Surely we want to compare like to like, and compare either instantaneous compute usage, or cumulative lifetime usage (which would multiply the human number by around 9 orders of magnitude).
A final confusion here is how to convert a forward pass to human thinking time. There’s some arguments that a forward pass is way more — you can’t read thousands of characters at once , for example — and some that it’s less — you can generate way more than literally one token at a time, and also do more impressive cognition.
So I think a better estimate for GPT3 cognition in real human equivalent time is something like 10^12 − 10^17 flops, while humans are 10^15 − 10^26 or whatever. This looks way less coincidental!
Most importantly, I think the obvious explanation applies here — to do the impressive kind of human seeming cognition GPT3 and it’s LLM brethren can do, using relatively straightforward methods like neural networks, takes a non-negligible fraction of the human brain’s compute.