jacob_cannell comments on $250 prize for checking Jake Cannell’s Brain Efficiency

jacob_cannell 28 Apr 2023 22:19 UTC
4 points
0
Thank you for the effort in organizing this conversation. I want to clarify a few points.

Around the very beginning of the density & temperature section I wrote:

but wire volume requirements scale linearly with dimension. So if we ignore all the machinery required for cellular maintenance and cooling, this indicates the brain is at most about 100x larger than strictly necessary (in radius), and more likely only 10x larger.

However, even though the wiring energy scales linearly with radius, the surface area power density which crucially determines temperature scales with the inverse squared radius, and the minimal energy requirements for synaptic computation are radius invariant.

Radius there refers to brain radius, not wire radius. Unfortunately there are two meanings of wiring energy or wire energy. By ‘wiring energy’ above hopefully the context helps make clear that I meant the total energy used by brain wiring/interconnect, not the ‘wire energy’ in terms of energy per bit per nm, which is more of a fixed constant that depends on wire design tradeoffs.

So my model was/is that if we assume you could just take the brain and keep the same amount of compute (neurons/synapses/etc) but somehow shrink the entire radius by a factor of D, this would decrease total wiring energy by the same factor D by just shortening all the wires in the obvious way.

However, the surface power density scales with radius as $1 / R^{2}$ , so the net effect is that surface power density from interconnect scales with $1 / R$ , ie it increases by a factor of D as you shrink by a factor of D, which thereby increases your cooling requirement (in terms of net heat flow) by the same factor D. But since the energy use of synaptic computation does not change, that just quickly dominates scaling with $1 / R^{2}$ and thus $D^{2}$ .

In the section you quoted where I say:

This in turn constrains off-chip memory bandwidth to scale poorly: shrinking feature sizes with Moore’s Law by a factor of D increases transistor density by a factor of $D^{2}$ , but at best only increases 2d off-chip wire density by a factor of only D, and doesn’t directly help reduce wire energy cost at all.

Now I have moved to talking about 2D microchips, and “wire energy” here means the energy per bit per nm, which again doesn’t scale with device size. Also the D here is scaling in a somewhat different way—it is referring to reducing the size of all devices as in normal moore’s law shrinkage while holding the total chip size constant, increasing device density.

Looking back at that section I see numerous clarifications I would make now, and I would also perhaps focus more on the surface power density as a function of size, and perhaps analyze cooling requirements. However I think it is reasonably clear from the document that shrinking the brain radius by a factor of X increases the surface power density (and thus cooling requirements in terms of coolant flow at fixed coolant temp) from synaptic computation by $X^{2}$ and from interconnect wiring by $X$ .

In practice digital computers are approaching the limits of miniaturization and tend to be 2D for fast logic chips in part for cooling considerations as I describe. The cerebras wafer for example represents a monumental engineering advance in terms of getting power in and pumping heat out to a small volume, but they still use a 2D chip design, not 3D, because 2D allows you dramatically more surface area for pumping in power and out heat than a 3D design, at the sacrifice of much worse interconnect geometry scaling in terms of latency and bandwidth.

We can make 3D chips today and do, but that tends to be most viable for memory rather than logic, because memory has far lower power density (and the brain being neuromorphic is more like a giant memory chip with logic sprinkled around right next to each memory unit).