Brain Efficiency Cannell Prize Contest Award Ceremony
Previously Jacob Cannell wrote the post “Brain Efficiency” which makes several radical claims: that the brain is at the pareto frontier of speed, energy efficiency and memory bandwith, that this represent a fundamental physical frontier.
Here’s an AI-generated summary
The article “Brain Efficiency: Much More than You Wanted to Know” on LessWrong discusses the efficiency of physical learning machines. The article explains that there are several interconnected key measures of efficiency for physical learning machines: energy efficiency in ops/J, spatial efficiency in ops/mm^2 or ops/mm^3, speed efficiency in time/delay for key learned tasks, circuit/compute efficiency in size and steps for key low-level algorithmic tasks, and learning/data efficiency in samples/observations/bits required to achieve a level of circuit efficiency, or per unit thereof. The article also explains why brain efficiency matters a great deal for AGI timelines and takeoff speeds, as AGI is implicitly/explicitly defined in terms of brain parity. The article predicts that AGI will consume compute & data in predictable brain-like ways and suggests that AGI will be far more like human simulations/emulations than you’d otherwise expect and will require training/education/raising vaguely like humans1.
Jake further has argued that this has implication for FOOM and DOOM.
Considering the intense technical mastery of nanoelectronics, thermodynamics and neuroscience required to assess the arguments here I concluded that a public debate between experts was called for. This was the start of the Brain Efficiency Prize contest which attracted over a 100 in-depth technically informed comments.
Now for the winners! Please note that the criteria for winning the contest was based on bringing in novel and substantive technical arguments as assesed by me. In contrast, general arguments about the likelihood of FOOM or DOOM while no doubt interesting did not factor into the judgement.
And the winners of the Jake Cannell Brain Efficiency Prize contest are
Ege Erdil
DaemonicSigil
… and Steven Byrnes!
Each has won $150, provided by Jake Cannell, Eli Tyre and myself.
I’d like to heartily congratulate the winners and thank everybody who engaged in the debate. The discussion were sometimes heated but always very informed. I was wowed and amazed by the extraordinary erudition and willingness for honest compassionate intellectual debate displayed by the winners.
So what are the takeaways?
I will let you be the judge. Again, remember the choice of the winners was made on my (layman) assesment that the participant brought in novel and substantive technical arguments and thereby furthered the debate.
Steven Byrnes
The jury was particularly impressed by Byrnes’ patient, open-minded and erudite participation in the debate.
He has kindly written a post detailing his views. Here’s his summary
Some ways that Jacob & I seem to be talking past each other
I will, however, point to some things that seem to be contributing to Jacob & me talking past each other, in my opinion.
Jacob likes to talk about detailed properties of the electrons in a metal wire (specifically, their de Broglie wavelength, mean free path, etc.), and I think those things cannot possibly be relevant here. I claim that once you know the resistance/length, capacitance/length, and inductance/length of a wire, you know everything there is to know about that wire’s electrical properties. All other information is screened off. For example, a metal wire can have a certain resistance-per-length by having a large number of mobile electrons with low mobility, or it could have the same resistance-per-length by having a smaller number of mobile electrons with higher mobility. And nobody cares which one it is—it just doesn’t matter in electronics.[8]
I want to talk about wire voltage profiles in terms of the “normal” wire / transmission line formulas (cf. telegrapher’s equations, characteristic impedance, etc.), and Jacob hasn’t been doing that AFAICT. I can derive all those wire-related formulas from first principles (ooh and check out my cool transmission line animations from my days as a wikipedia editor!), and I claim that those derivations are perfectly applicable in the context in question (nano-sized wire interconnects on chips), so I am pretty strongly averse to ignoring those formulas in favor of other things that don’t make sense to me.
Relatedly, I want to talk about voltage noise in terms of the “normal” electronics noise literature formulas, like Johnson noise, shot noise, crosstalk noise, etc., and Jacob hasn’t been doing that AFAICT. Again, I’m not taking these formulas on faith, I know their derivations from first principles, and I claim that they are applicable in the present context (nano-sized wire interconnects on chips) just like for any other wire. For example, the Johnson noise formula is actually the 1D version of Planck’s blackbody radiation equation—a deep and basic consequence of thermodynamics. Here I’m thinking here in particular of Jacob’s comment “it accumulates noise on the landauer scale at each nanoscale transmission step, and at the minimal landauer bit energy scale this noise rapidly collapses the bit representation (decays to noise) exponentially quickly”. I will remain highly skeptical of a claim like that unless I learn that it is derivable from the formulas for electrical noise on wires that I can find in the noise chapter of my electronics textbooks.
Jacob wants to describe wires as being made of small (≈1 nm) “tiles”, each of which is a different “bit”, with information flow down wires corresponding to dissipative bit-copying operations, and I reject that picture. For example, take a 100 μm long wire, on which signals propagate at a significant fraction of the speed of light. Now smoothly slew the voltage at one end of the wire from 0 to over the course of 0.1 ns. (In reality, the slew rate is indeed not infinite, but rather limited by transistor capacitance among other things.) Then, as you can check for yourself, the voltage across the entire wire will slew at the same rate at the same time. In other words, a movie of the voltage-vs-position curve on this 100 μm wire would look like a rising horizontal line, not a propagating wave. Now, recall where the Landauer limit comes from: bit-copy operations require kT of energy dissipation, because we go from four configurations (00,01,10,11) to two (00,11). The Second Law of Thermodynamics says we can’t reduce the number of microstates overall, so if the number of possible chip microstates goes down, we need to make up for it by increasing the temperature (and hence number of occupied microstates) elsewhere in the environment, i.e. we need to dissipate energy / dump heat. But in our hypothetical 100 μm long wire above, this analysis doesn’t apply! The different parts of the wire were never at different voltages in the first place, and therefore we never have to collapse more microstates into fewer.
…So anyway, I think our conversation had a bit of an unproductive dynamic where Jacob would explain why what I said cannot possibly be right [based on his “tiles” model], and then in turn I explain why what he said possibly be right [based on the formulas I like e.g. telegrapher’s equations], and then in turn Jacob would explain why that cannot possibly be right [based on his “tiles” model], and around and around we go.
DaemonicSigil
For their extensive post on the time and energy cost to erase a bit. The jury was particularly impressed by the extensive calculations and experiments that were done.
spxtr
Here are four key comments, [1],[2], [3] , [4]
the last of which is a detailed worked out example of a LDF5-50A Rigid Coax Cable that would violate Jacob’s postulated bound, see here:
Ethernet cables are twisted pair and will probably never be able to go that fast. You can get above 10 GHz with rigid coax cables, although you still have significant attenuation.
Let’s compute heat loss in a 100 m LDF5-50A, which evidently has 10.9 dB/100 m attenuation at 5 GHz. This is very low in my experience, but it’s what they claim.
Say we put 1 W of signal power at 5 GHz in one side. Because of the 10.9 dB attenuation, we receive 94 mW out the other side, with 906 mW lost to heat.
The Shannon-Hartley theorem says that we can compute the capacity of the wire as where is the bandwidth, is received signal power, and is noise power.
Let’s assume Johnson noise. These cables are rated up to 100 C, so I’ll use that temperature, although it doesn’t make a big difference.
If I plug in 5 GHz for , 94 mW for and for then I get a channel capacity of 160 GHz.
The heat lost is then Quite low compared to Jacob’s ~10 fJ/mm “theoretical lower bound.”
One free parameter is the signal power. The heat loss over the cable is linear in the signal power, while the channel capacity is sublinear, so lowering the signal power reduces the energy cost per bit. It is 10 fJ/bit/mm at about 300 W of input power, quite a lot!
Another is noise power. I assumed Johnson noise, which may be a reasonable assumption for an isolated coax cable, but not for an interconnect on a CPU. Adding an order of magnitude or two to the noise power does not substantially change the final energy cost per bit (0.05 goes to 0.07), however I doubt even that covers the amount of noise in a CPU interconnect.
Similarly, raising the cable attenuation to 50 dB/100 m does not even double the heat loss per bit. Shannon’s theorem still allows a significant capacity. It’s just a question of whether or not the receiver can read such small signals.
The reason that typical interconnects in CPUs and the like tend to be in the realm of 10-100 fJ/bit/mm is because of a wide range of engineering constraints, not because there is a theoretical minimum. Feel free to check my numbers of course. I did this pretty quickly.
Ege Erdil
For his continued insistence on making the cruxes explicit between Jake Cannell and his critics; honing in on key disagreements. Ege Erdil first suggested the idea of working out the example of the heat loss of a cable that would show a violation of Jake’s postulated boundes.
Maybe I’m interpreting these energies in a wrong way and we could violate Jacob’s postulated bounds by taking an Ethernet cable and transmitting 40 Gbps of information at a long distance, but I doubt that would actually work.
My own views
I personally moved from being very sympathetic to Jacob Cannell’s point of view to the point of giving a talk on his perspective to being much more skeptical. I was not aware to what degree his viewpoints differed from established physics academia. The tile Landauer model seems likely invalid—at the very least the claims that are made need to be much more explicitly detailed and made subject to peer review.
Still it seems many of considerations on the fundamental design trade-offs and limitations for brains and architecture are sensible and important. His ideas remain influential on how I think about the future of computer hardware, the brain and the form of superintelligent machines.
I’d like to once again thank all participants in this debate. It is my sincere belief that detailed, in-depth technical discussion is the way to move the debate forward and empower the public and decision-makers in making the right decisions.
I was heartened to see so many extremely informed people engage seriously and dispassionately. I hope that similar prize contests on topics of interest to the future of humanity may similarly spur debate. In particular, I hope to organize a prize contest on Yudkowsky’s views on nanotechnology. Let me know if you are interested in that happening.
- 27 Aug 2023 11:03 UTC; 7 points) 's comment on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong by (
I for one would love to see you organize a prize contest on Yudkowsky’s views on nanotech! I’m happy to pitch in some money for the prize pool.
Glad to hear somebody else is as excited about that as I am!
I’d be curious if you had any thoughts or suggestions on what would be a good way to set it up?
fwiw this is what gpt4 said when I asked it
No ideas on my end yet, sorry—that’s your job! ;)
🌝
Thanks for facilitating this! I found Steven’s post and spxtr’s comments particularly insightful.
I still think there’s a more fundamental issue with Jacob’s analysis, which is also present in some other works that characterize the brain as a substrate for various kinds of computation usually performed in silicon.
Namely, Jacob (and others) are implicitly or explicitly comparing FLOPS with “synaptic ops”, but these quantities are fundamentally incomparable. (I’ve been meaning to turn this into a top-level post for a while, but I’ll just dump a rough summary in a comment here quickly for now.)
FLOPS are a macro-level performance characteristic of a system. If I claim that a system is capable of 1 million FLOPS, that means something very precise in computer engineering terms. It means that the system can take 1 million pairs of floating point numbers (at some precision) and return 1 million results of multiplying or adding those pairs, each and every second.
OTOH, if I claim that a system is performing N million “synaptic ops” per second, there’s not a clear high-level / outward-facing / measurable performance characteristic that translates to. For large enough N, suitably arranged, you get human-level general cognition, which of course can be used to perform all sorts of behaviors that can be easily quantified into crisply specifiable performance characteristics. But that doesn’t mean that the synaptic ops themselves can be treated as a macro-level performance characteristic the way FLOPS can, even if each and every one of them really is maximally efficient on an individual basis and strictly necessary for the task at hand.
When you try to compare the brain and silicon-based systems in strictly valid but naive ways, you get results that are actually valid but not particularly useful or informative:
On the literal task of multiplying floats, almost any silicon system is vastly more efficient than a human brain, by any metric. Silicon can be used to multiply millions of floats per second for relatively tiny amounts of energy, whereas if you give a human a couple of floats and ask them to multiply them together, it would probably take several minutes, assuming they get the answer right and don’t wander off out of boredom, for a rate of <0.008 FLOPS.
On higher-level tasks like image recognition, visual processing, board games, etc. humans and silicon-based systems are more competitive, depending on the difficulty of the task and how the results are measured (energy, wall clock time, correctness, ELO etc.)
For sufficiently cognitively difficult tasks which can’t currently be performed in silico at all, there’s no rigorous way to do the comparison naively, except by speculating about how much computation future systems or algorithms might use. The time / energy / synaptic ops budget needed to e.g. solve a novel olympiad-level math problem for a human is roughly known and calculable, the time / energy / FLOPS budget for a silicon system to do the same is unknown (but in any case, it probably depends heavily on how you use the FLOPS, in terms of arranging them to perform high-level cognition). We can speculate about how efficient possible arrangements might be compared to brain, but for the actual arrangements that people today know how to write code for even in theory (e.g. AIXI), the efficiency ratio is essentially infinite (AIXI is incomputable, human brains are almost certainly not doing anything incomputable.)
I don’t think Jacob’s analysis is totally wrong or useless, but one must be very careful to keep track of what it is that is being compared, and why. Joe Carlsmith does this well in How Much Computational Power Does It Take to Match the Human Brain? (which Jacob cites). I think much of Jacob’s analysis is an attempt at what Joe calls the mechanistic method:
(Emphasis mine.) Joe is very careful to be clear that the analysis is about modeling the brain’s mechanisms (i.e. simulation), rather than attempting to directly or indirectly compare the “amount of computation” performed by brains or CPUs, or their relative efficiency at performing this purported / estimated / equated amount of computation.
Other methods, which Jacob also uses, (e.g. the limit method) in Joe’s analysis can be used to upper bound the FLOPS required for human-level general cognition, but Joe correctly points out that this analysis can’t be used directly to place a lower bound on how efficiently a difficult-to-crisply-specify task (e.g. high-level general cognition) can be performed:
I might expand on this more in the future, but I have procrastinated on it so far mainly because I don’t actually think the question these kinds of analyses attempt to answer is that relevant to important questions about AGI capabilities or limits: in my view, the minimal hardware required for creating superhuman AGI very likely already exists, probably many times over.
My own view is that you only need something a little bit smarter than the smartest humans, in some absolute sense, in order to re-arrange most of the matter and energy in the universe (almost) arbitrarily. If you can build epsilon-smarter-than-human-level AGI using (say) an energy budget of 1,000 W at runtime, you or the AGI itself can probably then figure out how to scale to 10,000 W or 100,000 W relatively easily (relative to the task of creating the 1,000 W AGI in the first place). And my guess is that the 100,000 W system is just sufficient for almost anything you or it wants to do, at least in the absence of other, adversarial agents of similar intelligence.
Rigorous, gears-level analyses can provide more and more precise lower bounds on the exact hardware and energy requirements for general intelligence, but these bounds are generally non-constructive. To actually advance capabilities (or make progress on alignment) my guess is that you need to do “great original natural philosophy”, as Tsvi calls it. If you do enough philosophy carefully and precisely before you (or anyone else) builds a 100,000 W AGI, you get a glorious transhuman future, if not, you probably get squiggles. And I think analyses like Joe’s and Jacob’s show that the hardware and energy required to build a merely human-level AGI probably already exists, even if it comes with a few OOM (or more) energy efficiency penalty relative to the brain. As many commenters on Jacob’s original post pointed out, silicon-based systems are already capable of making productive use of vastly more readily-available energy than a biological brain.
I think it does matter how efficient AI can get in using energy to fuel computation, in that I think it provides an important answer to the question over whether AI will be distributed widely, and whether we can realistically control AI distribution and creation.
If it turns out that creating superhuman AI is possible without much use of energy by individuals in their basement, then long term, controlling AI becomes essentially impossible, and we will have to confront a world where the government isn’t going to reliably control AI by default. Essentially, Eliezer’s initial ideas about the ability to create very strong technology in your basement may eventually become reality, just with a time delay.
If it turns out that any AI must use a minimum of say 10,000 watts or more, then there is hope for controlling AI creation and distribution long term.
And this matters both in scenarios where existential risk mostly comes from individuals, and scenarios where existential risk doesn’t matter, but what will happen in a world where superhuman AI is created.
Note, 1 kW (50-100x human brain wattage) is roughly the power consumption of a very beefy desktop PC, and 10 kW is roughly the power consumption of a single rack in a datacenter. Even ~megawatt scale AI (100 racks) could fit pretty easily within many existing datacenters, or within a single entity’s mid-size industrial-scale basement, at only moderate cost.
Yeah, this isn’t enough to stop companies from producing useful AI, but it does mostly mean we can hope to avoid scenarios where single individuals can reliably build AI, meaning that controlling AI in scenarios where individuals, but not companies are the problem for existential risk is possible. It’s also relevant for other questions not focused on existential risk as well.
Does this imply that a weakly superhuman AGI can solve alignment?
Thank you Max, you make some very good points.
I wrote a related post. But who would be defending those views? I could debate Yudkowsky if he wants, but I don’t think he does.
What about Drexler himself?