Here’s a quote from someone on EleutherAI discord 2 weeks ago:
Does anyone know how much training compute would be needed for AGI? The human brain uses about 10^16 flops and GPT-3 required about 10^23 flops for training which is 10 million times more than the human brain. It seems like we have the compute needed for AGI today but we still don’t have AGI.
I believe the first “flops” is FLOP/s and the second “flops” is FLOP, although I’m not 100% confident. I replied asking for clarification and didn’t get a straight answer.
FWIW, I am ~100% confident that this is correct in terms of what they refer to. Typical estimates of the brain are that it uses ~10^15 FLOP/s (give or take a few OOM) and the fastest supercomputer in the world uses ~10^18 FLOP/s when at maximum (so there’s no way GPT-3 was trained on 10^23 FLOP/s).
If we assume the exact numbers here are correct, then the actual conclusion is that GPT-3 was trained on the amount of compute the brain uses in 10 million seconds, or around 100 days.
Can you point to example usages that are causing confusion?
Here’s a quote from someone on EleutherAI discord 2 weeks ago:
I believe the first “flops” is FLOP/s and the second “flops” is FLOP, although I’m not 100% confident. I replied asking for clarification and didn’t get a straight answer.
FWIW, I am ~100% confident that this is correct in terms of what they refer to. Typical estimates of the brain are that it uses ~10^15 FLOP/s (give or take a few OOM) and the fastest supercomputer in the world uses ~10^18 FLOP/s when at maximum (so there’s no way GPT-3 was trained on 10^23 FLOP/s).
If we assume the exact numbers here are correct, then the actual conclusion is that GPT-3 was trained on the amount of compute the brain uses in 10 million seconds, or around 100 days.
That’s a great example, thanks!
(I’m used to “flops” always being used to describe the capacity of machines (as in, per second) and not total computation as in this example.)