Here’s another way to make this same point: think about energy usage. Joe Carlsmith’s report says we need (median) 1e15 FLOP/s to simulate a brain. Based on existing hardware (maybe 5e9 FLOP/joule?), that implies (median) 200kW to simulate a brain. (Hey, $20/hour electricity bills, not bad!)
I’d argue it’s closer to 1e14 TOP/s (1e14 synapses * ~1hz mean synaptic firing rate), but doesn’t matter much. TOP instead of TFLOP because floating point is unnecessary. A single A100 provides over 1e15 peak TOP/s (and about half as much peak TFLOP/s) for only 250W. An A100 is kinda expensive, but a 3090 has almost as much peak perf and costs only a few thousand $. Your energy estimate here is off by 3 OOM.
Hmm, just trying to understand where this difference is coming from:
Joe Carlsmith’s report and you agree with each other in saying that 1e14/s is a good central guess for the frequency of a spike hitting a synapse. But Joe guesses we need 1-100 FLOP per spike-synapse, which gives a central estimate of 1e15/s, whereas you think we should stay at 1. Hmm, my own opinion is “I don’t know, and deferring to a central number in Joe’s report seems like a reasonable thing to do in the meantime”. But if you put a gun to my head and asked me to pick my best-guess number, I would say “less than 1, at least after we tweak the algorithm implementation to be more silicon-appropriate”.
Next, there’s a factor of 1000 discrepency for energy-efficiency: I wrote 5e9 FLOP/joule and you’re saying that A100 is 5e12 tensor-op/J. Hmm, I seem to have gotten the 5e9 from a list of supercomputers. I imagine that the difference is a combination of (a little bit) FLOP vs OP, and (mostly) tensor-operations vs operations-on-arbitrary-numbers-pulled-from-RAM, or something. I imagine that the GPU is really optimized at doing tensor operations in parallel, and that allows way more operations for the same energy. I’m not an expert, that’s just my first guess. I would agree that the GPU case is closer to what we should expect.
Carlsmith’s report is pretty solid overall, and this doesn’t matter much because his final posterior mean of 1e15/s is still within A100 peak perf, but the high end of 100 FLOPs part is poorly justified based mostly on one outlier expert, and ultimately is padding for various uncertainties:
I’ll use 100 FLOPs per spike through synapse as a higher-end FLOP/s budget for synaptic transmission. This would at least cover Sarpeshkar’s 40 FLOP estimate, and provide some cushion for other things I might be missing
GPUs dominate in basically everything over CPUs: memory bandwidth (OOM greater), general operations-on-arbitrary-numbers-pulled-from-RAM (1 to 2 OOM greater), and matrix multiplication at various bit depths (many OOM greater). CPU based supercomputers are completely irrelevant for AGI considerations.
There are many GPU competitors but they generally have similar perf characteristics, with the exception of some pushing much higher on chip scratch SRAM and higher interconnect.
I’d argue it’s closer to 1e14 TOP/s (1e14 synapses * ~1hz mean synaptic firing rate), but doesn’t matter much. TOP instead of TFLOP because floating point is unnecessary. A single A100 provides over 1e15 peak TOP/s (and about half as much peak TFLOP/s) for only 250W. An A100 is kinda expensive, but a 3090 has almost as much peak perf and costs only a few thousand $. Your energy estimate here is off by 3 OOM.
Hmm, just trying to understand where this difference is coming from:
Joe Carlsmith’s report and you agree with each other in saying that 1e14/s is a good central guess for the frequency of a spike hitting a synapse. But Joe guesses we need 1-100 FLOP per spike-synapse, which gives a central estimate of 1e15/s, whereas you think we should stay at 1. Hmm, my own opinion is “I don’t know, and deferring to a central number in Joe’s report seems like a reasonable thing to do in the meantime”. But if you put a gun to my head and asked me to pick my best-guess number, I would say “less than 1, at least after we tweak the algorithm implementation to be more silicon-appropriate”.
Next, there’s a factor of 1000 discrepency for energy-efficiency: I wrote 5e9 FLOP/joule and you’re saying that A100 is 5e12 tensor-op/J. Hmm, I seem to have gotten the 5e9 from a list of supercomputers. I imagine that the difference is a combination of (a little bit) FLOP vs OP, and (mostly) tensor-operations vs operations-on-arbitrary-numbers-pulled-from-RAM, or something. I imagine that the GPU is really optimized at doing tensor operations in parallel, and that allows way more operations for the same energy. I’m not an expert, that’s just my first guess. I would agree that the GPU case is closer to what we should expect.
I added a note in the text. Thanks!
Carlsmith’s report is pretty solid overall, and this doesn’t matter much because his final posterior mean of 1e15/s is still within A100 peak perf, but the high end of 100 FLOPs part is poorly justified based mostly on one outlier expert, and ultimately is padding for various uncertainties:
GPUs dominate in basically everything over CPUs: memory bandwidth (OOM greater), general operations-on-arbitrary-numbers-pulled-from-RAM (1 to 2 OOM greater), and matrix multiplication at various bit depths (many OOM greater). CPU based supercomputers are completely irrelevant for AGI considerations.
There are many GPU competitors but they generally have similar perf characteristics, with the exception of some pushing much higher on chip scratch SRAM and higher interconnect.