Estimating Brain-Equivalent Compute from Image Recognition Algorithms
[Epistemic Status: Playing around with the idea of a benchmark with some rough numbers.]
When I read Biological Anchors: A Trick That Might Or Might Not Work, my thinking was: Biological anchors will work if your algorithms are close enough to what the brain does and can then be used to estimate the compute (FLOPs) needed for the rest of the brain. The compute equivalent of the brain has been discussed recently here (I think this indicates factor 100 more efficient algorithms) and here. I used this for predictions on Metaculus. This will not give you sharp bounds and not tell you whether algorithms could do things much cheaper or which ones to use. I have not seen this specific comparison elsewhere.
This started with the idea that we might already have some algorithms that perform as well as some parts of the brain and compare their costs, power requirements, and complexity. Specifically, image recognition is about equally good as human raters. Thus, let’s compare state-of-the-art image recognition algorithms with the corresponding brain regions (the visual cortex) and then extrapolate that to the whole brain.
I did this and here is the result:
Brain Region | Brodmann Area 17 Visual Cortex V1 | Algorithm | CoAtNet-7 |
[cm^3] Volume | 11 (1% brain volume) | ||
[10^6] Neurons | 280 (0.3% brain neurons) | [10^6] Parameters | 2500 |
[W] Power | 0.18 | [W] Power, 10 inferences/s | 13 at 2TFLOPs/W |
[10^21 FLOP] training | 200 | ||
[10^9 FLOP] inference | 2600 |
Whether the comparison should include only region V1 or also V2 to V5 of the visual cortex is worth asking, but the idea was to estimate conservatively and to exclude cognitive processes current algorithms definitely don’t cover.
Extrapolating the compute to the whole brain:
Inference: 8*10^15 FLOPs/s (86*10^9 neurons / 280*10^6 neurons * 2.6*10^12 FLOPs/inference * 10 inferences/second).
Training: 5*10^16 FLOP/s (86*10^9 neurons / 280*10^6 neurons * 2*10^23 FLOPs / 18 life years).
Pretty low compared to the numbers in Cotra’s paper.
There are just some problems: The visual cortex does much more than just static image recognition of 512x512 pixel images:
The resolution of the processed image is much higher: 120 million rods instead of a quarter-million pixels.
Stitching together the picture from blurred fragments (Saccades).
Building something like a 3D model (maybe not in V1, though).
Inferring actions in the scene change over time (mostly motions; object permanence). Some of this may be in the V2-V5 regions.
Unfortunately, I only realized this when I had already collected most of the above data. There is algorithmic progress on many of these points (e.g., there is active research in vision-based action detection), but no algorithms come close to human performance on these. Alternatively, I also tried to get corresponding numbers for auditory processing, but these were harder to get, and speech recognition also hasn’t reached human parity yet (cocktail party effect). Thus my initial assumption—that we have a brain region algorithmically covered—doesn’t hold up. I considered not posting this write-up but then decided that it might still be of interest to some readers.
Attempting to estimate AGI compute requirements from visual cortex and image classification has a long connectionist history. Moravec did this repeatedly, and Drexler has another version in his QNR whitepaper. AI Impacts or whoever was comparing to bees for similar reasons. Might be worth comparing.
For everybody who didn’t know—like me—what QNR are:
Drexler’s Language for Intelligent Machines: A Prospectus
on LW: QNR prospects are important for AI alignment research
The bees post was by Guilhermo Costa, an Open Phil intern. My comment has some discussion of the “but biological brains do so much more stuff than ML classifiers” point.
This is the 1976 Moravec calculation:
https://frc.ri.cmu.edu/~hpm/project.archive/general.articles/1978/analog.1978.html