Hmm, just trying to understand where this difference is coming from:
Joe Carlsmith’s report and you agree with each other in saying that 1e14/s is a good central guess for the frequency of a spike hitting a synapse. But Joe guesses we need 1-100 FLOP per spike-synapse, which gives a central estimate of 1e15/s, whereas you think we should stay at 1. Hmm, my own opinion is “I don’t know, and deferring to a central number in Joe’s report seems like a reasonable thing to do in the meantime”. But if you put a gun to my head and asked me to pick my best-guess number, I would say “less than 1, at least after we tweak the algorithm implementation to be more silicon-appropriate”.
Next, there’s a factor of 1000 discrepency for energy-efficiency: I wrote 5e9 FLOP/joule and you’re saying that A100 is 5e12 tensor-op/J. Hmm, I seem to have gotten the 5e9 from a list of supercomputers. I imagine that the difference is a combination of (a little bit) FLOP vs OP, and (mostly) tensor-operations vs operations-on-arbitrary-numbers-pulled-from-RAM, or something. I imagine that the GPU is really optimized at doing tensor operations in parallel, and that allows way more operations for the same energy. I’m not an expert, that’s just my first guess. I would agree that the GPU case is closer to what we should expect.
Carlsmith’s report is pretty solid overall, and this doesn’t matter much because his final posterior mean of 1e15/s is still within A100 peak perf, but the high end of 100 FLOPs part is poorly justified based mostly on one outlier expert, and ultimately is padding for various uncertainties:
I’ll use 100 FLOPs per spike through synapse as a higher-end FLOP/s budget for synaptic transmission. This would at least cover Sarpeshkar’s 40 FLOP estimate, and provide some cushion for other things I might be missing
GPUs dominate in basically everything over CPUs: memory bandwidth (OOM greater), general operations-on-arbitrary-numbers-pulled-from-RAM (1 to 2 OOM greater), and matrix multiplication at various bit depths (many OOM greater). CPU based supercomputers are completely irrelevant for AGI considerations.
There are many GPU competitors but they generally have similar perf characteristics, with the exception of some pushing much higher on chip scratch SRAM and higher interconnect.
Hmm, just trying to understand where this difference is coming from:
Joe Carlsmith’s report and you agree with each other in saying that 1e14/s is a good central guess for the frequency of a spike hitting a synapse. But Joe guesses we need 1-100 FLOP per spike-synapse, which gives a central estimate of 1e15/s, whereas you think we should stay at 1. Hmm, my own opinion is “I don’t know, and deferring to a central number in Joe’s report seems like a reasonable thing to do in the meantime”. But if you put a gun to my head and asked me to pick my best-guess number, I would say “less than 1, at least after we tweak the algorithm implementation to be more silicon-appropriate”.
Next, there’s a factor of 1000 discrepency for energy-efficiency: I wrote 5e9 FLOP/joule and you’re saying that A100 is 5e12 tensor-op/J. Hmm, I seem to have gotten the 5e9 from a list of supercomputers. I imagine that the difference is a combination of (a little bit) FLOP vs OP, and (mostly) tensor-operations vs operations-on-arbitrary-numbers-pulled-from-RAM, or something. I imagine that the GPU is really optimized at doing tensor operations in parallel, and that allows way more operations for the same energy. I’m not an expert, that’s just my first guess. I would agree that the GPU case is closer to what we should expect.
I added a note in the text. Thanks!
Carlsmith’s report is pretty solid overall, and this doesn’t matter much because his final posterior mean of 1e15/s is still within A100 peak perf, but the high end of 100 FLOPs part is poorly justified based mostly on one outlier expert, and ultimately is padding for various uncertainties:
GPUs dominate in basically everything over CPUs: memory bandwidth (OOM greater), general operations-on-arbitrary-numbers-pulled-from-RAM (1 to 2 OOM greater), and matrix multiplication at various bit depths (many OOM greater). CPU based supercomputers are completely irrelevant for AGI considerations.
There are many GPU competitors but they generally have similar perf characteristics, with the exception of some pushing much higher on chip scratch SRAM and higher interconnect.