Fair. I accept. 200:1 of my $100k against your $500. How are you setting these up?
I’m happy to pay $100k if my understanding of the universe (no aliens, no supernatural, etc.) is shaken. Also happy to pay up after 5 years if evidence turns up later about activities before or in this 5-year period.
(Also, regarding history, I have a second Less Wrong account with 11 years of history: https://www.lesswrong.com/users/tedsanders)
Ted Sanders
I’ll bet. Up to $100k of mine against $2k of yours. 50:1. (I honestly think the odds are more like 1000+:1, and would in principle be willing to go higher, but generally think people shouldn’t bet more than they’d be willing to lose, as bets above that amount could drive bad behavior. I would be happy to lose $100k on discovering aliens/time travel/new laws of physics/supernatural/etc.)
Happy to write a contract of sorts. I’m a findable figure and I’ve made public bets before (e.g., $4k wagered on AGI-fueled growth by 2043).
As an OpenAI employee I cannot say too much about short-term expectations for GPT, but I generally agree with most of his subpoints; e.g., running many copies, speeding up with additional compute, having way better capabilities than today, have more modalities than today. All of that sounds reasonable. The leap for me is (a) believing that results in transformative AGI and (b) figuring out how to get these things to learn (efficiently) from experience. So in the end I find myself pretty unmoved by his article (which is high quality, to be sure).
Bingo
No worries. I’ve made far worse. I only wish that H100s could operate at a gentle 70 W! :)
I think what I don’t understand is why you’re defaulting to the assumption that the brain has a way to store and update information that’s much more efficient than what we’re able to do. That doesn’t sound like a state of ignorance to me; it seems like you wouldn’t hold this belief if you didn’t think there was a good reason to do so.
It’s my assumption because our brains are AGI for ~20 W.
In contrast, many kW of GPUs are not AGI.
Therefore, it seems like brains have a way of storing and updating information that’s much more efficient than what we’re able to do.
Of course, maybe I’m wrong and it’s due to a lack of training or lack of data or lack of algorithms, rather than lack of hardware.
DNA storage is way more information dense than hard drives, for example.
One potential advantage of the brain is that it is 3D, whereas chips are mostly 2D. I wonder what advantage that confers. Presumably getting information around is much easier with 50% more dimensions.
70 W
Max power is 700 W, not 70 W. These chips are water-cooled beasts. Your estimate is off, not mine.
Let me try writing out some estimates. My math is different than yours.
An H100 SXM has:
8e10 transistors
2e9 Hz boost frequency of
2e15 FLOPS at FP16
7e2 W of max power consumption
Therefore:
2e6 eV are spent per FP16 operation
This is 1e8 times higher than the Landauer limit of 2e-2 eV per bit erasure at 70 C (and the ratio of bit erasures per FP16 operation is unclear to me; let’s pretend it’s O(1))
An H100 performs 1e6 FP16 operations per clock cycle, which implies 8e4 transistors per FP16 operation (some of which may be inactive, of course)
This seems pretty inefficient to me!
To recap, modern chips are roughly ~8 orders of magnitude worse than the Landauer limit (with a bit erasure per FP16 operation fudge factor that isn’t going to exceed 10). And this is in a configuration that takes 8e4 transistors to support a single FP16 operation!
Positing that brains are ~6 orders of magnitude more energy efficient than today’s transistor circuits doesn’t seem at all crazy to me. ~6 orders of improvement on 2e6 is ~2 eV per operation, still two orders of magnitude above the 0.02 eV per bit erasure Landauer limit.
I’ll note too that cells synthesize informative sequences from nucleic acids using less than 1 eV of free energy per bit. That clearly doesn’t violate Landauer or any laws of physics, because we know it happens.
Why does switching barriers imply that electrical potential energy is probably being converted to heat? I don’t see how that follows at all.
Where else is the energy going to go?
What is “the energy” that has to go somewhere? As you recognize, there’s nothing that says it costs energy to change the shape of a potential well. I’m genuinely not sure what energy you’re talking about here. Is it electrical potential energy spent polarizing a medium?
I think what I’m saying is standard in how people analyze power costs of switching in transistors, see e.g. this physics.se post.Yeah, that’s pretty standard. The ultimate efficiency limit for a semiconductor field-effect transistor is bounded by the 60 mV/dec subthreshold swing, and modern tiny transistors have to deal with all sorts of problems like leakage current which make it difficult to even reach that limit.
Unclear to me that semiconductor field-effect transistors have anything to do with neurons, but I don’t know how neurons work, so my confusion is more likely a state of my mind than a state of the world.
+1. The derailment probabilities are somewhat independent of the technical barrier probabilities in that they are conditioned on the technical barriers otherwise being overcome (e.g., setting them all to 100%). That said, if you assign high probabilities to the technical barriers being overcome quickly, then the odds of derailment are probably lower, as there are fewer years for derailments to occur and derailments that cause delay by a few years may still be recovered from.
Thanks, that’s clarifying. (And yes, I’m well aware that x → B*x is almost never injective, which is why I said it wouldn’t cause 8 bits of erasure rather than the stronger, incorrect claim of 0 bits of erasure.)
To store 1 bit of information you need a potential energy barrier that’s at least as high as k_B T log(2), so you need to switch ~ 8 such barriers, which means in any kind of realistic device you’ll lose ~ 8 k_B T log(2) of electrical potential energy to heat, either through resistance or through radiation. It doesn’t have to be like this, and some idealized device could do better, but GPUs are not idealized devices and neither are brains.
Two more points of confusion:
Why does switching barriers imply that electrical potential energy is probably being converted to heat? I don’t see how that follows at all.
To what extent do information storage requirements weigh on FLOPS requirements? It’s not obvious to me that requirements on energy barriers for long-term storage in thermodynamic equilibrium necessarily bear on transient representations of information in the midst of computations, either because the system is out of thermodynamic equilibrium or because storage times are very short
Right. The idea is: “What are the odds that China invading Taiwan derails chip production conditional on a world where we were otherwise going to successfully scale chip production.”
If we tried to simulate a GPU doing a simple matrix multiplication at high physical fidelity, we would have to take so many factors into account that the cost of our simulation would far exceed the cost of running the GPU itself. Similarly, if we tried to program a physically realistic simulation of the human brain, I have no doubt that the computational cost of doing so would be enormous.
The Beniaguev paper does not attempt to simulate neurons at high physical fidelity. It merely attempts to simulate their outputs, which is a far simpler task. I am in total agreement with you that the computation needed to simulate a system is entirely distinct from the computation being performed by that system. Simulating a human brain would require vastly more than 1e21 FLOPS.
Thanks for the constructive comments. I’m open-minded to being wrong here. I’ve already updated a bit and I’m happy to update more.
Regarding the Landauer limit, I’m confused by a few things:
First, I’m confused by your linkage between floating point operations and information erasure. For example, if we have two 8-bit registers (A, B) and multiply to get (A, B*A), we’ve done an 8-bit floating point operation without 8 bits of erasure. It seems quite plausible to be that the brain does 1e20 FLOPS but with a much smaller rate of bit erasures.
Second, I have no idea how to map the fidelity of brain operations to floating point precision, so I really don’t know if we should be comparing 1 bit, 8 bit, 64 bit, or not at all. Any ideas?
Regarding training requiring 8e34 floating point operations:
Ajeya Cotra estimates training could take anything from 1e24 to 1e54 floating point operations, or even more. Her narrower lifetime anchor ranges from 1e24 to 1e38ish. https://docs.google.com/document/d/1IJ6Sr-gPeXdSJugFulwIpvavc0atjHGM82QjIfUSBGQ/edit
Do you think Cotra’s estimates are not just poor, but crazy as well? If they were crazy, I would have expected to see her two-year update mention the mistake, or the top comments to point it out, but I see neither: https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines
Interested in betting thousands of dollars on this prediction? I’m game.
Interesting! How do you think this dimension of intelligence should be calculated? Are there any good articles on the subject?
What conditional probabilities would you assign, if you think ours are too low?
Conditioning does not necessarily follow time ordering. E.g., you can condition the odds of X on being in a world on track to develop robots by 2043 without having robots well in advance of X. Similarly, we can condition on a world where transformative AGI is trainable with 1e30 floating point operations then ask the likelihood that 1e30 floating point operations can be constructed and harnessed for TAGI. Remember too that in a world with rapidly advancing AI and robots, much of the demand will be for things other than TAGI.
I’m sympathetic to your point that it’s hard for brains to forecast these conditional probabilities. Certainly we may be wrong. But on the other hand, it’s also hard for brains to forecast things that involve smushing lots of probabilities together under the hood. I generally think that factoring things out into components helps, but I can understand if you disagree.
$500 payment received.
I am committed to paying $100k if aliens/supernatural/non-prosaic explanations are, in the next 5 years, considered, in aggregate, to be 50%+ likely in explaining at least one UFO.