Great post, I also find the entire line of argument from thermodynamic efficiency slightly weird. Even if the brain is perfectly energy efficient, the actual amount of energy it uses is still set by the number of calories that our ancestors were able to get in the ancestral environment, and its size set by the maximal baby head that can be reliably birthed by a woman. A superintelligence need only provide more than the brain’s 12 watts of power in order to exceed the number of computations thermodynamics allows us to do…
it’s not, in my view at least, useful as a means to dismiss any key point of ai safety—only to calibrate expectations about capabilities scaling curves.
Sure, the brain being thermodynamically efficient does constrain the capabilities scaling curves somewhat, but by far the biggest source of uncertainty remains on the relationship between the number of computations done by an agent and its ability to actually influence the world. I have no idea how many bit flips it takes to behave like an optimal bayesian agent.
Hmm. Have you read all of jake’s posts, and also discovering agents, and also agents and devices, and also alberta plan? or at least, if any of those are new to you, you could start with the abstracts. I don’t currently think an answer already exists to your question, and I find myself not quite able to just pop out a rephrasing of simulators in terms of discovering-agents agency or etc, but it feels like it’s not far out of reach to have a solid hunch about the big-O shape of possible answers. I still need to deep-read agents and devices, though, maybe that has what I need.
Do you usually correct people when they are being polite and courteous to others? I also find that days are seldom “great”, and that I’m not actually feeling that grateful when I say “thank you” to the cashier...
I think that commenting “Great post” carries more implication that you actually mean it (or something like it) than saying “very well, thank you” when someone asks how you’re doing. It doesn’t have to mean “this is one of the all-time great posts” or anything like that, but I at least wouldn’t say it if I didn’t mean at least “this post is good in absolute terms and better than most others like it”, and Jacob is claiming that that isn’t actually so. That’s not a nitpick about conventional phatic utterances, it’s an actual disagreement of substance, no?
I read both posts of yours that I linked, plus a chunk of that paper by Frank. It’s true that I don’t have time to read every single paper you cited in Brain Efficiency (I’m sure you understand), but it seems like you agree that achieving kTlog2 per bit erased is possible in principle for a very advanced reversible computer, so I’m not sure that reading a bunch more papers that all say the same thing would help.
Anyway, give me a list of misquotes and I’ll fix them?
There are many categories/classes of computational systems to which different limits apply. I am careful to remind the reader that I am analyzing only the specific class of computational systems that is most relevant for near term forecasting: conventional non-exotic irreversible computers (which includes all current GPUs/CPUs/accelerators/etc and also all known brains).
Examples from my brain efficiency post:
If the brain is about 6 OOM away from the practical physical limits of energy efficiency, then roughly speaking we should expect about 6 OOM of further Moore’s Law hardware improvement past the point of brain parity
In worlds where brains are efficient, AGI is first feasible only near the end of Moore’s Law (for non-exotic, reversible computers),
Both brains and current semiconductor chips are built on dissipative/irreversible wire signaling,
It is super tedious to put this huge warning every god damn time I use the word computer, or interconnect or anything related along the lines of THIS ONLY APPLIES TO NEAR TERM CONVENTIONAL IRREVERSIBLE COMPUTERS every time I want to say anything. So I try to make it clear what type of computer I am talking about, but don’t necessarily use those full qualifiers every time I use any computational term.
Obviously a hypothetical future reversible computer could best the brain in energy efficiency! But its not clear that it would be actually practically superior.
That being said, you are also somewhat incorrect about how the Landauer principle/limit actually works, and you should read/respond my comment about that here. I shouldn’t fault you for that too much because its a very common mistake to make, and is ‘only’ a 2 OOM difference, but the off cited kTlog2 cost of bit energy is a highly simplified lower bound and only applies at the limit of using infinite time for the erasure and or a useless error probability of 50%. The erasure itself is a probabilistic process with an inherent error probability, and correcting errors itself requires erasures. Mike Frank almost always uses ~1 eV or similar for reliable high speed bit energy, as do most authors.
Fair enough for optical, but I have no idea why he’s dismissing superconductive interconnect as impractical.
That research path has basically petered out and lost to GPUs (It was funded a bit for a while for exascale computing initiatives), but thats a whole long other story. Regardless, thats absolutely an exotic unconventional reversible computer design and outside of scope of my analysis there.
More generally, the issue here is that we’ve moved away from thermodynamic limits and into practical engineering constraints.
I never was talking about abstract highly conservative theoretical lower bounds—my focus is on tight realistic near term bounds/estimates, so I focused on conventional computers which are both all that is likely relevant in the near-term, and much more well understood.
Someday probably I do expect that reversible and or quantum computing will be relevant, but its extraordinarily complicated to analyze as you also need to factor in the algorithmic differences. You can’t just take a serial algorithm and get the same performance on an ultra slow parallel computer, and the algorithmic structure of reversible and quantum logic is even more complex and constraining. For many problems of interest, there is as of now essentially no significant quantum speedup, and reversible computing has related issues.
Fair enough for pointing out those parts of Brain Efficiency, and I understand the struggle with disclaimers, writing is hard[1].
Seems like there’s a slight chance we still disagree about that remaining 2 OOM? With current technology where we’re stuck with dissipating the entire quantity of energy we used to represent a bit when we erase it, then I agree that it has to be ~1eV. With hypothetical future high-tech reversible computers, then those last 2 OOM aren’t a barrier and erasure costs all the way down to kTlog2 should be possible in principle. I think you and I and Frank and Landauer all agree about that last statement about hypothetical high-tech reversible computers, but if you do happen to disagree, please let me know.
[1] I do still think you could have been way more clear about this when citing it in Contra Yudkowsky, even with only a few words. I also still don’t understand why your reaction to Eliezer saying the difference was 6 OOM, was to conclude that he didn’t understand what he was talking about. It makes sense for you to focus on what’s possible with near-term technology since you’re trying to forecast how compute will scale in the future, but didn’t it occur to you to that maybe Eliezer’s statements did not have a similarly narrow scope?
With hypothetical future high-tech reversible computers, then those last 2 OOM aren’t a barrier and erasure costs all the way down to kTlog2 should be possible in principle. I think you and I and Frank and Landauer all agree about that last statement about hypothetical high-tech reversible computers,
Only if by “possible in principle” you mean using near infinite time, with a strictly useless error probability of 50%. Again Landauer and Frank absolutely do not agree with the oversimplified wikipedia cliff note hot take of kTlog2. See again section 6 “Three sources of error”, and my reply here with details.
I also still don’t understand why your reaction to Eliezer saying the difference was 6 OOM, was to conclude that he didn’t understand what he was talking about. It makes sense for you to focus on what’s possible with near-term technology since you’re trying to forecast how compute will scale in the future, but didn’t it occur to you to that maybe Eliezer’s statements did not have a similarly narrow scope?
It did, but look very carefully at EY’s statement again:
namely that biology is simply not that efficient, and especially when it comes to huge complicated things that it has started doing relatively recently.
ATP synthase may be close to 100% thermodynamically efficient, but ATP synthase is literally over 1.5 billion years old and a core bottleneck on all biological metabolism. Brains have to pump thousands of ions in and out of each stretch of axon and dendrite, in order to restore their ability to fire another fast neural spike. The result is that the brain’s computation is something like half a million times less efficient than the thermodynamic limit for its temperature
The result of half a million times more energy than the limit comes specifically from pumping thousands of ions in and out of each stretch of axon and dendrite. This is irrelevant if you assume hypothetical reversible computing—why is the length of axon/dendrite interconnect relevant if you aren’t dissipating energy for interconnect anyway?
Remember he is claiming that biology is simply not that efficient. He is not comparing it to some exotic hypothetical reversible computer which uses superconducting wires and/or advanced optical interconnects which don’t seem accessible to biology. He is claiming that even within biological constraints of accessible building blocks, biology is not that efficient.
Also remember that EY believes in strong nanotech, and for nanotech in particular reversible computation is irrelevant because all the bottleneck operations: copying nanobot instruction code and nanobot replication—are completely necessarily irreversible. So in that domain we absolutely can determine that biology is truly near optimal in practice, which I already covered elsewhere. EY would probably not be so wildly excited about nanotech if he accepted that biological cells were already operating near the hard thermodynamic limits for nanotech replicators.
Finally the interpretation where EY thinks the 6 OOM come only from exotic future reversible computing is mostly incompatible with his world view that brains/biology is inefficient and AGI nanotech is going to kill us soon—its part of my counter argument.
So is EY wrong about brain efficiency? Or does he agree that the brain is near the efficiency limits of a conventional reversible computer and surpassing it by many OOM requires hypothetical exotic computers? Which is it?
Great post, I also find the entire line of argument from thermodynamic efficiency slightly weird. Even if the brain is perfectly energy efficient, the actual amount of energy it uses is still set by the number of calories that our ancestors were able to get in the ancestral environment, and its size set by the maximal baby head that can be reliably birthed by a woman. A superintelligence need only provide more than the brain’s 12 watts of power in order to exceed the number of computations thermodynamics allows us to do…
it’s not, in my view at least, useful as a means to dismiss any key point of ai safety—only to calibrate expectations about capabilities scaling curves.
Sure, the brain being thermodynamically efficient does constrain the capabilities scaling curves somewhat, but by far the biggest source of uncertainty remains on the relationship between the number of computations done by an agent and its ability to actually influence the world. I have no idea how many bit flips it takes to behave like an optimal bayesian agent.
Hmm. Have you read all of jake’s posts, and also discovering agents, and also agents and devices, and also alberta plan? or at least, if any of those are new to you, you could start with the abstracts. I don’t currently think an answer already exists to your question, and I find myself not quite able to just pop out a rephrasing of simulators in terms of discovering-agents agency or etc, but it feels like it’s not far out of reach to have a solid hunch about the big-O shape of possible answers. I still need to deep-read agents and devices, though, maybe that has what I need.
This is not a “great post”, he didn’t actually read the sources and misquotes/misinterprets me.
Do you usually correct people when they are being polite and courteous to others? I also find that days are seldom “great”, and that I’m not actually feeling that grateful when I say “thank you” to the cashier...
I think that commenting “Great post” carries more implication that you actually mean it (or something like it) than saying “very well, thank you” when someone asks how you’re doing. It doesn’t have to mean “this is one of the all-time great posts” or anything like that, but I at least wouldn’t say it if I didn’t mean at least “this post is good in absolute terms and better than most others like it”, and Jacob is claiming that that isn’t actually so. That’s not a nitpick about conventional phatic utterances, it’s an actual disagreement of substance, no?
I read both posts of yours that I linked, plus a chunk of that paper by Frank. It’s true that I don’t have time to read every single paper you cited in Brain Efficiency (I’m sure you understand), but it seems like you agree that achieving kTlog2 per bit erased is possible in principle for a very advanced reversible computer, so I’m not sure that reading a bunch more papers that all say the same thing would help.
Anyway, give me a list of misquotes and I’ll fix them?
There are many categories/classes of computational systems to which different limits apply. I am careful to remind the reader that I am analyzing only the specific class of computational systems that is most relevant for near term forecasting: conventional non-exotic irreversible computers (which includes all current GPUs/CPUs/accelerators/etc and also all known brains).
Examples from my brain efficiency post:
It is super tedious to put this huge warning every god damn time I use the word computer, or interconnect or anything related along the lines of THIS ONLY APPLIES TO NEAR TERM CONVENTIONAL IRREVERSIBLE COMPUTERS every time I want to say anything. So I try to make it clear what type of computer I am talking about, but don’t necessarily use those full qualifiers every time I use any computational term.
Obviously a hypothetical future reversible computer could best the brain in energy efficiency! But its not clear that it would be actually practically superior.
That being said, you are also somewhat incorrect about how the Landauer principle/limit actually works, and you should read/respond my comment about that here. I shouldn’t fault you for that too much because its a very common mistake to make, and is ‘only’ a 2 OOM difference, but the off cited kTlog2 cost of bit energy is a highly simplified lower bound and only applies at the limit of using infinite time for the erasure and or a useless error probability of 50%. The erasure itself is a probabilistic process with an inherent error probability, and correcting errors itself requires erasures. Mike Frank almost always uses ~1 eV or similar for reliable high speed bit energy, as do most authors.
That research path has basically petered out and lost to GPUs (It was funded a bit for a while for exascale computing initiatives), but thats a whole long other story. Regardless, thats absolutely an exotic unconventional reversible computer design and outside of scope of my analysis there.
I never was talking about abstract highly conservative theoretical lower bounds—my focus is on tight realistic near term bounds/estimates, so I focused on conventional computers which are both all that is likely relevant in the near-term, and much more well understood.
Someday probably I do expect that reversible and or quantum computing will be relevant, but its extraordinarily complicated to analyze as you also need to factor in the algorithmic differences. You can’t just take a serial algorithm and get the same performance on an ultra slow parallel computer, and the algorithmic structure of reversible and quantum logic is even more complex and constraining. For many problems of interest, there is as of now essentially no significant quantum speedup, and reversible computing has related issues.
Fair enough for pointing out those parts of Brain Efficiency, and I understand the struggle with disclaimers, writing is hard[1].
Seems like there’s a slight chance we still disagree about that remaining 2 OOM? With current technology where we’re stuck with dissipating the entire quantity of energy we used to represent a bit when we erase it, then I agree that it has to be ~1eV. With hypothetical future high-tech reversible computers, then those last 2 OOM aren’t a barrier and erasure costs all the way down to kTlog2 should be possible in principle. I think you and I and Frank and Landauer all agree about that last statement about hypothetical high-tech reversible computers, but if you do happen to disagree, please let me know.
[1] I do still think you could have been way more clear about this when citing it in Contra Yudkowsky, even with only a few words. I also still don’t understand why your reaction to Eliezer saying the difference was 6 OOM, was to conclude that he didn’t understand what he was talking about. It makes sense for you to focus on what’s possible with near-term technology since you’re trying to forecast how compute will scale in the future, but didn’t it occur to you to that maybe Eliezer’s statements did not have a similarly narrow scope?
Only if by “possible in principle” you mean using near infinite time, with a strictly useless error probability of 50%. Again Landauer and Frank absolutely do not agree with the oversimplified wikipedia cliff note hot take of kTlog2. See again section 6 “Three sources of error”, and my reply here with details.
It did, but look very carefully at EY’s statement again:
The result of half a million times more energy than the limit comes specifically from pumping thousands of ions in and out of each stretch of axon and dendrite. This is irrelevant if you assume hypothetical reversible computing—why is the length of axon/dendrite interconnect relevant if you aren’t dissipating energy for interconnect anyway?
Remember he is claiming that biology is simply not that efficient. He is not comparing it to some exotic hypothetical reversible computer which uses superconducting wires and/or advanced optical interconnects which don’t seem accessible to biology. He is claiming that even within biological constraints of accessible building blocks, biology is not that efficient.
Also remember that EY believes in strong nanotech, and for nanotech in particular reversible computation is irrelevant because all the bottleneck operations: copying nanobot instruction code and nanobot replication—are completely necessarily irreversible. So in that domain we absolutely can determine that biology is truly near optimal in practice, which I already covered elsewhere. EY would probably not be so wildly excited about nanotech if he accepted that biological cells were already operating near the hard thermodynamic limits for nanotech replicators.
Finally the interpretation where EY thinks the 6 OOM come only from exotic future reversible computing is mostly incompatible with his world view that brains/biology is inefficient and AGI nanotech is going to kill us soon—its part of my counter argument.
So is EY wrong about brain efficiency? Or does he agree that the brain is near the efficiency limits of a conventional reversible computer and surpassing it by many OOM requires hypothetical exotic computers? Which is it?